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A quantum ensemble {(p x , p x )} is a set of quantum states each occurring randomly with a given 
probability. Quantum ensembles are necessary to describe situations with incomplete a priori in- 
formation, such as the output of a stochastic quantum channel (generalized measurement), and 
play a central role in quantum communication. In this paper, we propose measures of distance and 
fidelity between two quantum ensembles. We consider two approaches: the first one is based on 
the ability to mimic one ensemble given the other one as a resource and is closely related to the 
Monge-Kantorovich optimal transportation problem, while the second one uses the idea of extended- 
Hilbert-space (EHS) representations which introduce auxiliary pointer (or flag) states. Both types 
of measures enjoy a number of desirable properties. The Kantorovich measures, albeit monotonic 
under deterministic quantum operations, are not monotonic under generalized measurements. In 
contrast, the EHS measures are. We present operational interpretations for both types of mea- 
sures. We also show that the EHS fidelity between ensembles provides a novel interpretation of 
the fidelity between mixed states — the latter is equal to the maximum of the fidelity between all 
pure-state ensembles whose averages are equal to the mixed states being compared. We finally 
use the new measures to define distance and fidelity for stochastic quantum channels and positive 
operator- valued measures (POVMs). These quantities may be useful in the context of tomography 
of stochastic quantum channels and quantum detectors. 



I. INTRODUCTION 

A fundamental difference between classical and quan- 
tum systems is that, while classical states can be faith- 
fully distinguished, two generic quantum states cannot 
be distinguished with arbitrary precision by any oper- 
ational means. A natural measure that quantifies the 
similarity of two pure quantum states \if)) and \<p) is the 
transition probability between them, i.e., the probability 
with which the two states would yield the same outcome 
under a measurement for which one of the states is the 
unique state that yields a particular outcome with cer- 
tainty. This quantity is symmetric with respect to the 
states and is given by the square of their overlap, | ( ip | <p) \ 2 . 
In the case of mixed states, there is no straightforward 
analogue of the transition probability since there is no 
measurement for which a mixed state is the unique state 
that yields a particular outcome with certainty. 

A generalization of the concept of transition probabil- 
ity to mixed states was proposed by Uhlmann [l[ and 
it is given by the minimum of the transition probability 
between two purifications of the mixed states, over all 
possible purifications. The square root of this quantity, 
which is given by the simple expression 



F(p,a) = TW VvpVv, 



(1) 



is known as the square root fidelity between two den- 
sity matrices a and p and has proven extremely useful 
in quantum information theory (2j. From the square 
root fidelity (or fidelity for short), one can define var- 
ious distances between states, such_ as the Bures dis- 

or the Bures an- 



tance B(p, a) = y/T — F(p, a) 
gle A(p, a) — arccos F(p, a) 



In addition to fidelity-based measures, various other 
measures of distance have been proposed (see, e.g., 
Refs. @, 0,1, HE3, El). The trace distance [|, 



P- 



(2) 



5], which can be re- 



garded as measures of the difference between two states. 



for example, where || O || = Tr vOUj is the trace norm of 
an operator O, is widely used due to its simple form, vari- 
ous useful properties, and its operational meaning related 
to the maximum probability with which the two states p 
and a can be distinguished by a measurement. 

The problem of distinguishing two quantum states 
bares strong similarity to the problem of distinguishing 
two classical probability distributions by looking at the 
value of a random variable sampled from one of them. 
Unless the supports of the two distributions have no over- 
lap, the probability of guessing correctly from which en- 
semble the variable was drawn is strictly smaller than 
unity. In the classical case, however, the two probabil- 
ity distributions concern the outcomes of only a single 
observable — the one corresponding to the random vari- 
able. In the quantum case, there is a continuum of pos- 
sible observations that one can perform on the systems 
and a continuum of corresponding random variables. 

Different quantum measurements establish different 
correspondence between quantum states and probabil- 
ity distributions. This suggests a natural approach to 
defining distinguishability measures between states. For 
instance, the fidelity between two quantum states is 
equal to the minimum statistical overlap between the 
probability distributions generated b y a ll possible mea- 
surements performed on the states [Tjj. The statisti- 
cal overlap in question is the Bhattacharyya coefficient 
J2 X V P( X )Q( X ) between classical probability distribu- 
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tions P(x) and Q(x) (here a; is a classical random vari- 
able). Similarly, the trace distance @ can be obtained 
by maximizing over all possible measurements the Kol- 
mogorov distance J2 X h\P( x ) — Q( x )\ between the cor- 
responding outcome probability distributions. As ex- 
pected, in the limit of commuting density matrices, both 
the fidelity and the trace distance reduce to their classi- 
cal counterparts, i.e., to the Bhattacharyya overlap and 
the Kolmogorov distance, respectively. 

As is manifested in these examples, density matrices 
can be thought of as generalizations of classical proba- 
bility distributions, which include the latter as a special 
case. However, in many quantum information scenarios, 
one often deals with an even more general concept, which 
is a hybrid between the quantum and classical cases. This 
is the concept of a probabilistic ensemble of quantum 
states, i.e., a classical probability distribution of density 
matrices. Ensembles of quantum states describe situa- 
tions in which a quantum system can take a number of 
different states at random according to some probability 
distribution. Such a situation is, for example, the out- 
come of a quantum measurement. A quantum measure- 
ment can be regarded as a stochastic quantum channel 
that outputs different quantum states with probabilities 
that depend on the input state according to the general- 
ized Born rule [2j]. When the measurement is projective, 
the possible output states are orthogonal and the output 
ensemble can be regarded as a classical one. However, in 
the case of generalized measurements the states need not 
be orthogonal, and the output of the channel is a genuine 
quantum ensemble. 

A quantum state is said to "...capture the best infor- 
mation available about how a quantum system will react 
in this or that experimental situation" [l3| . Accordingly, 
a quantum ensemble gives the best information available 
about how a quantum system will react in this or that ex- 
perimental situation when the choice of experiment can 
be made conditional on some classical side information. 
The uses or applications of the quantum system will de- 
pend strongly on the particular quantum states that ap- 
pear in the ensemble and on their probabilities. 

It should be noted that in the context of resource the- 
ory [3| , a protocol consisting of allowed transformations 
generally involves measurements, and the resource avail- 
able after a measurement is given by the average resource 
of the resulting ensemble. For example, the restriction 
to local operations and classical communication (LOCC) 
naturally gives rise to entanglement as a resource, which 
is quantified by an entanglement monotone — a function 
which does not increase on average under LOCC EH Eft 
In this sense, entanglement can be thought of as a func- 
tion defined on ensembles. Ensembles of quantum states 
have various other applications in quantum information 
theory, with particularly notable ones in quantum com- 
munication, e.g., for representing sources of quantum 
states used for communication [13, EH, or for describ- 
ing "static resources" of shared classical-quantum corre- 
lations in multi-partite systems [l9l |. 



Even though various measures of distance and fidelity 
between quantum states have been studied, similar mea- 
sures for ensembles of states have been lacking. With 
the development of quantum technology, it becomes im- 
portant to be able to rigorously compare different ex- 
perimental schemes and assess the degree to which they 
differ from ideal ones. The existing measures of distance 
and fidelity between quantum states are sufficient for this 
purpose when the system of interest at a given stage of 
the experiment is described by a single quantum state. 
These measures can also be used to define distance and 
fidelity between deterministic quantum operations, i.e., 
completely positive trace-preserving (CPTP) maps [20| . 
However, in many situations an experiment may involve 
states obtained randomly according to some probabil- 
ity distribution, such as the states obtained during the 
process of entanglement concentration [2l|, or the states 
resulting from the measurement of an error syndrome 
during and error-correction protocol [22j ]. or simply a 
source of quantum states used for communication. It 
is therefore important to have a distinguishability mea- 
sure between two ensembles of states. Furthermore, the 
tools of quantum information involve not only CPTP 
maps but also stochastic quantum operations (general- 
ized measurements), and a figure of merit comparing two 
such operations (e.g., a real one with an ideal one) would 
require a quantitative comparison between their output 
ensembles. Rigorous measures that compare generalized 
measurements would be useful, in particular, for assess- 
ing the performance of quantum detectors, which can 
now be characterized experimentally [26| through quan- 
tum detector tomography [HI, [24|, [Hj] . 

The purpose of this paper is to propose measures of dis- 
tance and fidelity between ensembles of quantum states 
and use them to define distance and fidelity between gen- 
eralized measurements. The rest of the paper is organized 
as follows. In Sec. II, we review the concept of an ensem- 
ble of quantum states and establish nomenclature. In 
Sec. Ill, we discuss some basic properties that we expect a 
measure of distinguishability between ensembles to have, 
and rule out several naive candidates. In Sec. IV, we pro- 
pose measures of distance and fidelity of a Kantorovich 
type and study their properties. We first introduce the 
measure of distance on the basis of intuitive considera- 
tions concerning the ability of states obtained randomly 
from one ensemble to mimic states obtained randomly 
from the other ensemble. The measure is based on the 
trace distance between states and satisfies a number of 
desirable properties. In addition to the standard distance 
properties, it is jointly convex, monotonic under averag- 
ing of the ensembles and under CPTP maps. When the 
ensembles are discrete, the measure is equivalent to a lin- 
ear program and can be computed efficiently in the size 
of the set of states participating in the ensembles. We 
show that for simple limiting cases, the distance between 
ensembles reduces to intuitive expressions involving the 
trace distance between states. We introduce a measure 
of fidelity between ensembles in a similar fashion. The 
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fidelity satisfies properties analogous to those of the dis- 
tance and also can be computed as a linear program. 
We provide operational interpretations of both quanti- 
ties. We show that for the case when the measures are 
based on the trace distance and the standard fidelity, 
the measures are not monotonic under generalized mea- 
surements. We explain why this is natural considering 
the operational interpretations of the quantities and de- 
rive necessary and sufficient conditions which the basic 
measures of distance or fidelity between states have to 
satisfy in order for the corresponding Kantorovich mea- 
sures to be monotonic under measurements. In Sec. V, 
we propose measures of distance and fidelity which make 
use of the extended-Hilbert-space (EHS) representation 
of ensembles [l9j . We argue that to every ensemble of 
quantum states there is a corresponding class of valid 
EHS representations and provide a rigorous definition of 
this class. We then define the measures as a minimum 
(maximum) of the distance (fidelity) between all possible 
EHS representations of the ensembles being compared. 
We show that these definitions can be simplified and are 
equivalent to convex optimization problems. We also pro- 
vide equivalent formulations without reference to an ex- 
tended Hilbert space. These quantities are based on the 
trace distance and the square root fidelity and inherit all 
their celebrated properties such as joint convexity in the 
case of the trace distance or strong concavity in the case 
of the fidelity. In addition, they are monotonic under 
averaging of the ensembles, as well as under generalized 
measurements. The latter property can be regarded as 
a generalization of the monotonicity under CPTP maps 
of the trace distance and the square root fidelity. The 
EHS measures are upper (lower) bounded by the Kan- 
torovich distance (fidelity). We provide operational in- 
terpretations for the EHS measures too. In Sec. VI, we 
present a novel interpretation of the standard fidelity be- 
tween mixed states as a maximum of the fidelity between 
all pure-state ensembles from which the mixed states be- 
ing compared can be constructed. The fidelity between 
pure-state ensembles used in this definition is of the EHS 
type but can be expressed without any reference to fi- 
delity between mixed states and has a form which can be 
regarded as a generalization of the Bhattacharyya over- 
lap. In Sec. VII, we use the measures between ensembles 
of quantum states to define distance and fidelity between 
generalized measurements. We consider two definitions — 
one based on the Jamiolkowski isomorphism [27j and an- 
other based on worst-case comparison — and discuss their 
properties. We also propose distance and fidelity be- 
tween positive operator- valued measures (POVMs). In 
Sec. VIII, we conclude. 



II. ENSEMBLES OF QUANTUM STATES 

Let B(TC) denote the set of linear operators on a finite- 
dimensional Hilbert space TC. For the purposes of this 
paper, a (probabilistic) ensemble of quantum states is 



a set of pairs {(p x ,p x )} of probabilities p x (p x > 0, 
TlixVx — 1) and distinct density matrices p x G B{TL) 
(p x > 0, Tr^) = 1), p x ^ p y for x ^ y. For simplicity, 
we will assume that the set of states participating in an 
ensemble is discrete (i.e., the index x runs over a count- 
able set), although we expect that our considerations ex- 
tend to non-discrete ensembles as well. We will use the 
concept of ensemble of states to describe situations in 
which a system takes a state p x at random with proba- 
bility p x . The statement that a system takes the state p x 
means that there exists classical information about the 
identity of the state. This is to be distinguished from the 
situation in which no information about the identity of 
the state exists or can be obtained. In the latter case, for 
all practical purposes, the average density matrix of the 
ensemble, p = ^2 x p x p x , provides a complete description 
of the state of the system. 

An example of an ensemble of states is the output of 
a non-destructive generalized measurement. Under the 
most general type of quantum measurement, a density 
matrix p transforms as 

/> ~ *■ Pi = FFTTTT" n » witn probability pi = TrMi(p), 
TrMi{p) 

(3) 

where Mi(-) — YljMij(-)M}j is the measurement su- 
peroperator corresponding to measurement outcome i. 
(The operators My satisfy the completeness relation 
J2i j MjjMij — I.) Note that different measurement out- 
comes do not necessarily yield different output states. 
For example, both outcomes of a measurement on a 
qubit system with measurement superoperators Mi(-) = 
|0)(0|(-)|0)(0| and M 2 {-) = |0)(1|(-)|1)(0| leave the sys- 
tem in the state |0)(0|, although they provide information 
about the input state. If {p x } is the set of distinct output 
states, each occurring with probability p x = £\ pi, 
the ensemble of post-measurement states resulting from 
the stochastic transformation © is {(p x ,p x )}- 

Let {{p x ,p x )} be an ensemble of density matrices over 
a Hilbert space H. If 17 j is the set of all density matrices 
p x that participate in the ensemble, we can equivalcntly 
represent the ensemble as a probability distribution P(p), 
p € 17i {P(p x ) = Px), over the set 17i . Consider a sec- 
ond ensemble, Q(cr), a £ 17 2 , where the set 17 2 is not 
necessarily equal to 17 1. We can think of the two ensem- 
bles as corresponding to probability distributions over 
the same set, by extending the definitions of P(p) and 
Q(cr) to the larger set 17 = 17i U 172 through assigning 
zero probabilities to those states that do not participate 
in the respective ensembles. Therefore, without loss of 
generality, we will treat the ensembles that we compare 
as probability distributions P(p) and Q{p) over the same 
set 17. (Sometimes, when it is clear from the context, we 
will denote the ensembles we compare simply by P and 
Q.) Most generally, the set 17 can be taken to be the full 
set of density matrices over 7i, but in this paper we will 
assume that 17 is discrete. 

The fact that P(p) and Q(p) are valid probability dis- 
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tributions is expressed in the conditions 



^P(//) = l, P(j>)>0, Vpen, 



(4) 



p'en 



EQM = 1, Q(P)>0, Vpefi. (5) 
p'en 

If our world is ultimately quantum, it is natural to ex- 
pect that an ensemble of quantum states must have a 
description in terms of the state of a (possibly larger) 
quantum system. Indeed, there is a correspondence be- 
tween an ensemble of the form {(p x ,p x )} and a state of 
the form 



P = '^PxPx ® \x){x\ 



(6) 



where the pointer (or flag) states {|a;)} are an orthonor- 
mal set in the Hilbert space of an auxiliary system of a 
sufficiently large dimension jl9j |. The pointer states can 
be thought of as carrying the classical information about 
which particular state from the ensemble we are given — 
a measurement of the classical system yields the quan- 
tum state p x with probability p x , which is equivalent to 
drawing a state randomly from the ensemble. Reversely, 
if we are given a state drawn randomly from the ensem- 
ble, we can record our knowledge about the identity of 
the state in a 'classical' pointer attached to it and forget 
the information about the state since this information is 
stored in the pointer and can always be retrieved. After 
the latter operation, the state of the original system plus 
the pointer system is described by 'Yli X p x px® \ x ){ x \- This 
representation is referred to as an extended-Hilbert-space 
(EHS) representation of an ensemble [ijij]. For simplicity 
and in order to distinguish the system storing the clas- 
sical memory from the quantum system, we will use the 
following notation for the pointers: 



[x] = \x)(x\. 
In this notation, the state ([6]) reads 



P : 



PxPx® [X] 



(7) 



(8) 



In terms of the description of an ensemble as a proba- 
bility distribution P(p) over a set of states ft, an EHS 
representation of this type can be written as 



Pp 



pen 



P{p) P ®[ P ], 



(9) 



where {[p]} is an orthonormal set of pure pointer states 
each of which is associated with a unique density 
matrix p G tt. We will develop this concept further in 
Sec. V. 



III. NAIVE CANDIDATES 

Before we propose distinguishability measures between 
two ensembles of quantum states, let us first consider 



what properties we expect such measures to have. The 
answer to this question will depend on the operational 
context in which we want to compare the ensembles. 

We could ask, for example, how different on aver- 
age two states drawn randomly from the two ensem- 
bles are. Given a measure of distance d{p, a) between 
states, the average distance in that sense would be 

S X P{p)Q{< J )d{p, cr). This quantity obviously could 

penmen 

be non-zero even when the two ensembles are identical. 
Similarly, we could look at the average fidelity which can 
be smaller than 1 for identical ensembles. Thus even 
though these quantities have a well defined meaning, they 
are not good measures of distinguishability. 

Another possibility is to look at a distance d(p P ,p~Q) 
between the average density matrices ~p P — P{p)P 

pen 

and ~pQ = J>2 Q{p)p of the two ensembles, or the fi- 

pSSl 

delity i^(p P ,pg) between them. Obviously, for identical 
ensembles the distance is equal to and the fidelity is 
equal to 1. However, these quantities cannot discrimi- 
nate between different ensembles that have the same av- 
erage density matrices. Imagine, for example, that an 
experimentalist has at her disposal two devices. The first 
one produces the two-qubit Bell states ^ 00 ^1 11 ^ , l 00 ^! 11 ^ , 

|01> ^ 10> , |01> ^ 10) , each occurring with probability 1/4, 
together with a classical indicator specifying which state 
is produced. The second device produces the two-qubit 
product states |00), 1 01) , |10), |11), each occurring with 
probability 1/4, again with an indicator of the identity of 
the state. Although the average states in the two cases 
are the same, the ensembles produced by the two devices 
have very different properties. In the first case, the av- 
erage entanglement between the two qubits is maximal, 
whereas in the second case it is zero. Therefore, in or- 
der to capture the difference between two ensembles, we 
would like our measure of distance (fidelity) to be (1) 
if and only if P(p) = Q(p), V> e O. 

Measures of distance and fidelity which satisfy the lat- 
ter requirement could be any measures of distance and 
fidelity between probability distributions which treat p 
as a classical variable. Consider, for example, the Kol- 
mogorov distance | Yl \P{p) ~ Q(p)\- Note that this dis- 

pGSl 

tance is precisely equal to the trace distance between the 
EHS representations of the two ensembles of type (J9j) , 

a(pp, p q ) = \\\Yj p ^p ® M - £ Q^) a ® M I! 

pen <teq 
pen 

In a similar manner, we could look at the Bhat- 

tacharyya overlap V 'P{p)Q{p)i which is equal to the 

pen 

fidelity between the two EHS representations of type ((9]). 
Such measures, however, do not take into account the 
quantum-mechanical aspect of the variables p. If the two 
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distributions P and Q have supports on non-overlapping 
subsets of Q, the above distance (fidelity) would be max- 
imal (minimal), but as we mentioned earlier, two distinct 
density matrices are not necessarily distinguishable (they 
often behave as if they are the same state) and we would 
like our distance and fidelity to capture this property. In 
particular, in the special case where each of the two en- 
sembles consists of a single state, we would like the mea- 
sures between the two ensembles to be equal to the dis- 
tance or fidelity between the respective states. If we used 
the above distance (fidelity) between classical probability 
distributions in this case, we would obtain a maximum 
(minimum) value even if the two states are very similar. 
At the same time, it is natural to expect that a distance 
between ensembles would reduce to a distance between 
classical probability distributions when the states partic- 
ipating in the ensembles are orthogonal. 



IV. DISTANCE AND FIDELITY OF A 
KANTOROVICH TYPE 

A. Motivating the definitions 

The above examples suggest that distinguishability 
measures with the desired properties may have to be non- 
trivial functions of the probability distributions and the 
set of states participating in the ensembles. Heuristically, 
a distance (fidelity) between two quantum states can be 
regarded as a measure of the extent to which the two 
states do not (do) behave as if they are the same state 
(the precise meaning of this statement depends on the op- 
erational meaning of the distance (fidelity) in question). 
In a similar manner, we would expect a distance (fidelity) 
between two ensembles of quantum states to compare the 
extent to which the two ensembles do not (do) "behave" 
as if they are the same ensemble. Since the ensemble 
is a statistical concept which describes the situation of 
having particular states with particular probabilities, we 
would like to compare the extent to which states drawn 
randomly from one ensemble can be used to mimic states 
drawn randomly from the other ensemble. 

When states drawn randomly from the ensemble 
{(Q(cr), a)} are used to mimic states drawn from the en- 
semble {(P(p), p)}, a given state a obtained according to 
the distribution Q(a) most generally can be taken with 
different probabilities to pass off as different states p from 
{(P(p),p)}. In other words, the process of mimicking 
one ensemble using the other one as a resource can be de- 
scribed by a transition probability matrix whose elements 
T(p\a), p,d e f2, describe the probabilities with which 
the state a sampled from the distribution Q(a) is taken 
to pass off as the state p sampled from P{p). The require- 
ment that under this simulation the probabilities are con- 
sistent with the probabilities P{p) and Q(o~), respectively, 
is expressed in the condition ^ T{p\cr)Q{<j) = P(p)- 

The fact that T(p\a) describe valid transition probabil- 



ities imposes the conditions T(p\a) > 0, Vp, a G fl, and 

pen 

In order to measure how much the state a fails to 
mimic the state p, we can use any measure of distance 
between states. In this paper, we will concentrate on the 
case of the trace distance, A(p,a) (Eq. ([2])). To mea- 
sure the degree to which a map T(p\a) from one ensem- 
ble to the other fails to mimic the latter, we propose 
to use the average distance between the actual states 
and those that they mimic: ^2 p aen T(p\a)Q(a)A(p 1 a). 
We can write this expression in an explicitly sym- 
metric form by introducing the joint probability dis- 
tribution II(p, cr) = T(p\a)Q(a) which satisfies the 
marginal conditions J2aen n(p, cr ) = P (p)> Vp 6 fl, and 

D a (P,Q)= ]T n(p, CT )A(p,a). (11) 

Clearly, different choices of the map T(p|a) (or equiva- 
lently, of II(p, a)) can yield different values for the quan- 
tity (TTTj) . Therefore, we define the distance between the 
two ensembles as the minimum of the quantity (|11[) over 
all possible choices of IT(p, cr), i.e., we choose the optimal 
mimicking strategy. 

Definition 1 (Kantorovich distance). Let P{p) 
and Q(p), p £ f2, be two ensembles (probability distri- 
butions over f2), which we denote by P and Q for short. 
Then 

D K (P, Q) = min Y, n G°> (J ) A (/ 3 ' ff )> ( 12 ) 

where minimum is taken over all joint probability 
distributions II(p, a) with marginals X^en H(p, a) = 
P(p), Vpefi, a ndJ2 f>ea Il(p,a)=Q(a), Wen. 

The quantity lfT2")) is of the same form as the Kan- 
torovich formulation of the optimal transportation prob- 
lem |28j . which is a relaxation of a problem studied in 
1781 by Monge. In 1975, Kantorovich received the Nobel 
Prize in Economics, together with Koopmans, for their 
contributions to the theory of optimum allocation of re- 
sources, and he is considered to be one of the fathers of 
linear programming. The optimal transportation prob- 
lem can be cast in the spirit of its original formulations 
as follows: 

Assume you have to transport the coal produced in 
some mines X to the factories Y. The amounts pro- 
duced in each mine {P±,P2, ■ ■ ■} as well as the needs for 
each factory {Qi, Q2, • ■ ■} are given. There is a cost per 
unit of mass c(x, y) to move coal from mine x to fac- 
tory y. The problem is to find the optimal transportation 
plan or transportation map T(y\x), i.e., for every mine x 
determine how much material has to be carried to every 
factory y so as to minimize the overall cost. 

The analogy with the above definition (TT2^) is straight- 
forward: mines and factories play the role of the quantum 
states p and a in each ensemble respectively, and the cost 
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function is given by the trace distance. Kantorovich's for- 
mulation extended also to non-discrete probability mea- 
sures [29| and was one of the first infinite-dimensional 
linear programming problems to be considered. If the 
probability measures are defined over a metric space and 
the cost function is taken to be the corresponding dis- 
tance function, the optimal average cost is known as 
the Kantorovich distance (also referred to as Wasserstein 
distance [3(|). The optimal transportation problem is 
now an active field of research with tight connections 
with problems in geometry, probability theory, differen- 
tial equations, fluid mechanics, economics and image or 
data processing. 

Based on the same idea we can define a fidelity between 
two ensembles, which we will refer to as the Kantorovich 
fidelity. 

Definition 2 (Kantorovich fidelity). The Kan- 
torovich fidelity between the ensembles P(p) and Q(p), 
p G ft, is 



F (P, Q) 



max 



E 



Il(p,a)F(p,a), 



(13) 



where F(p, a) is the square root fidelity between p and 
a (Eq. (pj), and maximum is taken over all joint proba- 
bility distributions II(p, a) that satisfy n(p ; f) = 
P(p), Vp€f>, and£ pea n(p,a) = Q(a), Wa € fl. 



B. Properties of the Kantorovich distance 

Let Vn denote the set of probability distributions over 
a set of density matrices ft. 
Property 1 (Positivity). 



D K (P,Q)>0, 



(14) 



with equality 

D K (P, Q) = iff P{ P ) = Q{p), Vp € O. (15) 



Proof. Since all terms in Eq. 1|12|) are non-negative, 
the distance D K (P, Q) is also non- negative. Obviously, if 
P(p) = Q(p), Vp G ft, we obtain D K (P, Q) = by choos- 
ing the joint probability distribution IT(p, a) — <5 Pi0 -P(p). 
Inversely, assume that D K (P.Q) = 0. This means that 
all terms in Eq. (|12[) must be zero, which can happen only 
if II(/9, a) oc S p ,0. From the condition for the marginal 
probability distributions, we see that IT(p, a) — 5 p a P(p) 
and P(p) = Q(p). 

Property 2 (Normalization). 



with equality 



D K (P,Q) < 1, 
VPQgPq, 



D K (PQ) = 1 



(16) 



(17) 



if and only if the supports of P and Q are orthogonal sets 
of states. 

Proof. Since A(p, a) < 1, then for any given H(p,a) 

we have 52 P ,*ea u (p> a ) A (P> a ) < E P , CTe o n (^ CT ) = 1 - 
Furthermore, A(p, c) = 1 if and only if p and a are 
orthogonal. Observe that the only non-zero values 
n(p, a) of a joint probability distribution that respects 
the marginal distributions P and Q are those for which 
p is in the support of P and a is in the support of Q. 
Therefore, if P and Q have supports on sets of density 
matrices which are orthogonal, every non-zero compo- 
nent II(p, a) in the sum on the right-hand side of Eq. (fT2|) 
will be multiplied by A(p, c) = 1, which implies that 
P jK {P,Q) = 1- Inversely, since Y, P ^en U (P> a ) = 1 if 
D K (P, Q) = 1, then every non-zero II(p, a) on the right- 
hand side of Eq. (TT2"|) must be multiplied by 1, which 
implies that P and Q must have supports on orthogonal 
sets. 

Property 3 (Symmetry). 



D K {P,Q) =D K (Q,P), 
V P,QeVn- 



(18) 



Proof. The symmetry follows from the definition (fT2|) 
and the symmetry of A(p, a). 

Property 4 (Triangle inequality). 

D K {P, R) < D K {P, Q) + D K (Q. R), (19) 
V P,Q,Re V a . 

Proof. Let H p ®(p, a) and TJy R (p, a) be the two joint 
probability distributions which achieve the minimum in 
Eq. (fl~2|) for the pairs of distributions (P, Q) and (Q,R), 
respectively. Consider the quantity 

(20) 



where for 



Q(k) 



0, 



we define 



n PQ (p,K)QT^IL QR (K,a) = (note that if Q(k) = 0, 

then U pq ( P ,k) = I1Q r (k : <t) = 0, Vp, a G ft). One 
can readily verify that this is a valid joint probability 
distribution with marginals P and R. Therefore, we 
have 

D K (P,R)< n PR (p, CT )A(p,a) 

< n^(p, K )-i T n^( K , ( r)A(p, K ) 

+ n^(p, K )^lL2«( K ,a)A( K ,a) 
= n p Q(p, K )A(p, K )+ ^ Tl QR (K,a)A( K ,a) 

= D K (PQ) + D K (Q,R), (21) 
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where in the second inequality we have used the triangle 
inequality for A. 

Property 5 (Joint convexity). 

D K { P P 1 + (l-p)P 2lP Q 1 + {l-p)Q 2 ) (22) 
< pD K {P 1 ,Q 1 ) + (1 - p)D K (P 2 , Q a ), 

VPi,P 2 ,Qi,Q2eVn, Vpe [0,1]. 

Proof. Let II 1 (p,a) and H 2 (p,a) be two joint 
probability distributions which achieve the minimum 
in Eq. (fT2"| for the pairs of distributions (Pi,Qi) and 
(-P2,(?2), respectively. It is immediately seen that 

il 12 (p, a) = pn\ P , a) + (1 - p)n 2 ( P , <t) (23) 

is a joint probability distribution with marginals pP\ + 
(1 - p)Pi and pQi + (1 - p)Q 2 . Therefore, 

D K {pP 1 + (1 - p)P 2 , P Qi + (1 - p)Q 2 ) 
< J] n 12 (p I a)A(p, C r) 

= pE + £ n 2 (p,a)A(p l( T) 

= P D K (P 1 ,Q 1 ) + (l-p)D K {P 2 ,Q 2 ). (24) 

Property 6 (Monotonicity under CPTP maps). 

Let £ : B(Tt) — > B(H'), where H and Tt' generally can 
have different dimensions, be a completely positive trace- 
preserving (CPTP) map. (Any such map can be writ- 
ten in the Kraus form £(p) = Y<i M *P M L V /° e B ( n ) 
(3l|). Denote the set of density matrices consisting of 
£(p), with p E CI, by Cl £ . If we apply the same CPTP 
map £ to every state in an ensemble P(p), p € CI, we 
obtain another ensemble P'(p'), p' G Cl £ . Note that gen- 
erally P(p) ^ P'(£(pj), because the map £ may be such 
that it takes two or more different states from CI to one 
and the same state in Ct £ , e.g., £(pi) = £(pi), Pi £ CI, 
p 2 6 CI, pi 7^ p 2 . (The opposite obviously cannot happen 
because every state p in CI is mapped to a unique state 
£ (p) £ Cl £ .) Thus the operation £ induces a map from 
the set of probability distributions over CI to the set of 
probability distributions over Cl £ . Denote this map by 

Now we can state the property of monotonicity under 
CPTP maps as follows: For all CPTP maps £, 

D K (P,Q) > D K (M £ (P),M £ (Qj), (25) 

where M £ : Vn — > Pq e is the map induced by £ . 

Proof. Let II(p, a) be a joint probabil- 

ity distribution for which the minimum in 
the definition (TTJ]) of D K (P,Q) is attained. 
Observe that J2 p aen U(p,a)A(£(p),£(a)) 

n VV)A(p>'), where W(p',a') is a joint 
probability distribution over Cl £ x Cl £ with marginals 
P'(p') and Q(p')- This can be seen from the fact that 
P'(p') = ^j X P{Px), where the sum is over all p x E Cl 



such that p' = £(p x ). Similarly, Q'(a') = Y^yP^v)^ 
where the sum is over all a y £ Cl such that a' = £(a y ). 
Therefore, we have that 

D K (M £ (P),M £ (Q))< ]T n»')A(pV) 
= ]T n(p,a)A(£(p),£(a)) < ]T Il(p, <x)A(p, a) 

= D K (P,Q), (26) 

where the last inequality follows from the monotonicity 
of A(p, a) under CPTP maps 

Corollary (Invariance under unitary maps). 

For all unitary maps hi, 

D K (P,Q) = D K (M U (P),M U (Q)). (27) 

The property follows from the fact that unitary maps are 
reversible CPTP maps. 

Property 7 (Monotonicity under averaging). 

Let P denote the singleton ensemble consisting of the 

average state of P(p), Pp — P{p)P- Then 

pen 

D K (P,Q) >D K (P,Q). (28) 

Proof. Let IT(<7, p) be a joint probability distribution 
for which the minimum in the definition (fT2l) of D(P, Q) 
is attained. Since A(p, a) is jointly convex [2j], we have 

D K (PQ)= n(j>,a)*(p,v) 

>A(^ n(p,*)p, ]T u(p,<j)<j) 

pen aeo 
= A(p P ,p Q )=D K (P,Q). (29) 

(For the last equality, see Eq. (|4"5|) below.) 

Corollary. If two distributions are close, their average 
states are also close, i.e., 

if D K {P.Q) < e, then A{p P ,p Q ) < D K (P,Q) < e. 

(30) 

Property 8 (Continuity of the average of a con- 
tinuous function). Let h(p) be a bounded function, 
which is continuous with respect to the distance A. Then 

the ensemble average of h(p), hp — P(p)Hp)> i s con ~ 

pen 

tinuous with respect to D K . 

Proof. The proof is presented in Appendix A. 

Comment. Property 8 naturally reflects the idea of 
states as resources. Assuming that a resource is a con- 
tinuous function of the state, if two ensembles are close, 
their corresponding average resources must also be close. 

Example (Continuity of the Holevo informa- 
tion). A function of ensembles, which is of great sig- 
nificance in quantum information theory, is the Holevo 
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information It 



(31) 



Here ~p = ^2 x p x Px is the average density matrix of the 
ensemble {{p x , Px)} which we denote by P for short, and 
S(p) = — Tr(plogp) is the von Neumann entropy. This 
function gives an upper bound to the amount of infor- 
mation about the index x extractable through measure- 
ments on a state obtained randomly from the ensemble 
and is used to define the classical capacity of a quan tum 
channel under independent uses of the channel [33j, l34j . 
The second term in the expression (I31[) is the average of 
the von Neumann entropy over the ensemble, while the 
first term is the von Neumann entropy of the average. 
Since S(p) is a continuous function, from Property 8 and 
the Corollary of Property 7 one can easily see that the 
Holevo information is a continuous function of the ensem- 
ble with respect to the Kantorovich distance. It would be 
interesting, however, to obtain an explicit bound of that 
continuity. For this purpose, we will need the following 
lemma. 

Lemma 1. If a function h(p) satisfies the continuity 
property 



\h(p)~h(a)\<g[A(p,a)} 



(32) 



for some function g[x] that is concave in x € [0, 1], then 
the ensemble average of h(p) satisfies 



\h P -h Q \<g[D K (P,Q)] 



(33) 



Proof. Let H(p, a) be a joint probability distribution 
which attains the minimum in Eq. (|12p for the distribu- 
tions P and Q. Then, 

\hp-h Q \ = \J2p(pMp) - *E,Q(.o)h(*)\ 

pen aen 
= | n(p,a)%)- Y, n(p,«r)fc(«7)| 

< n(p,a)\h(p)-h(a)\< Y n(p,a)g[A(p,a)} 



<9 



2 n(p,a)A(p,a) 



g[D K (P,Q)]. (34) 



Theorem 1 (A Fannes-type inequality for the 
ensemble average of the von Neumann entropy). 

For any two ensembles P and Q of density matrices over 
a d-dimensional Hilbert space, 



\S p -Sq\ <D«\og 2 {d-l)+H{{D n ,l-D«)), (35) 

where D K is the Kantorovich distance between the 
ensembles P and Q, and H ((D K , 1 - D K )) = 
-D K \og 2 {D K ) - (1 - D K ) log 2 (l - D K ) is the Shannon 
entropy of the binary probability distribution (D K , 1 — 
D K ). 



Comment. This inequality is based on a Fannes-type 
inequality for the von Neumann entropy due to Audc- 
naert [351 ] . which is stronger than the original inequal- 
ity by Fannes [36| and provides the sharpest continuity 
bound for the von Neumann entropy based on A and d. 

Proof. In Ref. [35|, it was shown that 

\S(p)-S(a)\ < Alog 2 (d-l) + #((A,l-A)). (36) 

The theorem follows from Lemma 1 and the fact that the 
right-hand side of Eq. (|3"6"|) is a concave function of A. 

Corollary (Continuity bound for the Holevo in- 
formation). The term S(~p) in the expression (|3ip for 
the Holevo information is not an average of a func- 
tion, but according to the Corollary of Property 7, 
A(ct,7j) < D K (P,Q). The right-hand side of Eq. (36|) 
is monotonically increasing in the interval < A < 
(d — l)/d and monotonically decreasing in the interval 
(d — l)/d < A < 1. Therefore, we can write 

\S(a) - S(p)\ < D K log 2 (d - 1) + H((D K , 1 - D K )) 
for < D K < (d-l)/d. (37) 

Combining Eq. (|3"S"|) and Eq. (|3~T|) , we obtain 

\X(Q) - X(P)\ < 2D K log 2 (d - 1) + 2H((D K , 1 - D K )) 
for < D K < (d-l)/d. (38) 

For the interval (d— l)/d < D K < 1 , we can upper bound 
\S(a) — S(~p)\ by its maximum value log 2 (d), and we can 
write the weaker inequality 

\x(Q)-x(P)\ < 

log 2 (d) + D K \og 2 {d - 1) + H((D K , 1 - D K )) 

for (d- l)/d < D K < 1. (39) 

Property 9 (Stability). Let P(p), p G fl, and R(a'), 
a' E Q', be two ensembles of quantum states, where f2 
and fi' are sets of states of two different systems. Define 
the tensor product of the two ensembles as the ensemble 
{(P{p)R(a'),p(g>a')}, which we will denote by P(g)R for 
short. Let P{p) and Q(p) be two ensembles of states in 
f2 and -R(c') be an ensemble of states in 51'. Then, 



D K (P ®R,Q®R) = D K (P, Q) 



K i 



(40) 



Comment. The physical meaning of this property 
is that unrelated ensembles do not affect the value of 
D K (P,Q). Even though this may seem as a natural prop- 
erty to expect from a distance, it does not hold in general 
even for distance measures between states. For example, 
the Hilbert-Schmidt distance y/Tr(p — a) 2 , which has a 
well-defined operational meaning [7|, is not stable. 

Proof. Let 

D K {P® R,Q(g> R) = 
Y Il(p®T',a(g>K')A(p®T',a(g>K'), (41) 
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where II(p ® r',cr k') has left and right marginals 
P(p)R(t') and Q{a)R(n'), respectively. From the mono- 
tonicity of A under partial tracing it follows that 

D K {P® R,Q®R) > tf(p,<r)A(p,ff), (42) 
where 

n'( ( o,ff)= ^ n(/)8T>® K ') (43) 

is a joint probability distribution with left and right 
marginals P(p) and Q(cr), respectively. Therefore, 

D K (P (g> R,Q (g> R) > D K (P,Q). (44) 

But by choosing H(p ® t'jCt ® «/) = II(p, <j)R(t')5 t i k i, 
where II(p, cr) is a joint distribution which attains the 
minimum in the definition (fT2"]) of D K (P,Q), and using 
the stability of A, the equality in Eq. (|44|) is attained. 
This completes the proof. 

Property 10 (Linear programming). The task of 
finding the optimal II (cr, p) in Eq. (|12p is a linear program 
and can be solved efficiently in the cardinality of $1. 

Proof. If the cardinality of is N, we can think 
of A(p, a), p, a 6 Q as the components c M , p = (p, cr), 
of an 7V 2 -component vector which we will denote by c. 
The joint probability distribution II(p, cr) over which we 
want to minimize the expression on the right-hand side of 
Eq. (fT2"]) can similarly be thought of as an 7V 2 -component 
vector x with components x^, p = (p, cr). Thus the task 
of finding the optimal n(p, cr) can be expressed in the 
compact form 

Minimize c T x. (45) 

The constraints Xcren H(p, cr) = P(p), Vp S £1, and 
en n(p, cr) = Q(cr), Vcr e SI, can also be expressed 
in a compact matrix forms as 

Ax = a, 

Bx = b, (46) 

where A is an N x iV 2 matrix with components A K p = S Kp 
where p = (p, cr) is a double index, B is an AT x N 2 matrix 
with components B{n,p) — S Ka (p = (p, cr)), and a and 
6 are iV-component vectors with elements a K = P(k), 
k € O, and 6 K = Q(k), k € f2, respectively. In addition, 
the positivity of the quantities II(p, cr) amounts to the 
constraint 

x > 0. (47) 

Eqs. J45[) - (l4"T|) are the canonical form of a linear program, 
which can be solved efficiently in the length iV 2 of the 
vector x. This completes the proof. 

It is natural to ask about the properties of the dis- 
tance in certain simple limiting cases. We consider the 
following three cases. 



Limiting case 1 (Two singleton ensembles). If 

P(p) = S pT , p,r E and Q(p) = S pa , p, a € 0, i.e., each 
of the ensembles P and Q consists of only a single state, 
then the distance between the ensembles is equal to the 
distance between the respective states, 

D K (P,Q) = A( T ,a). (48) 

Proof. Obviously, the only joint probability distribu- 
tion with marginals P and Q in this case is IT(k, r) = 
Sko-Stp, so the property follows. 

Limiting case 2 (One singleton ensemble). If the 

ensemble Q(p) consists of only one state cr, i.e., Q(p) — 
8 pai p,a 6 fl, then the distance between P(p) and Q{p) 
is equal to the average distance between a state drawn 
from the ensemble P{p) and the state a, 

D K (P,Q) = Y / P(P)&(P,V)- (49) 

pGfi 

Proof. The property follows from the fact that the 
only joint probability distribution with marginals P and 
Q in this case is U(n,p) = 5 aK P{p). 

Limiting case 3 (Classical distributions). If the 

set fl consists of perfectly distinguishable density matri- 
ces, i.e., A(p, cr) = 1 - Spa, Vp, cr G f2, then D K (P,Q) 
reduces to the Kolmogorov distance between the classi- 
cal probability distributions P and Q, 

d k (p,q) = \Y)p{p)-Q{p)\- (50) 

pen 

Proof. Since in this case the set fi consists of orthog- 
onal states, we can write the right-hand side of Eq. (fl~2"|) 

as 

min n 0> (J ) x 1 + H P) x 

= min(l-^n(p,p)), (51) 

where the equality follows from the fact that 

n(p,cr) + ]Tn(p,p) = i. (52) 

The minimum in Eq. (|5ip is achieved when X^pen n(p, p) 
is maximal, which in turn is achieved when each of the 
terms II(p, p) is maximal. Since the maximum value of 
II(p, p) is min (P (p), Q(p)), we obtain 

D K (Q,P) = (1 - ]Tmin(Q(p),P(p))) 

pen 

pen 

Comment. Note that we can distinguish two limits 
which can be interpreted as comparing classical proba- 
bility distributions. One is Limiting case 3 — probability 
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distributions over a set of orthogonal states. The other is 
the case where each of the two ensembles consists of a sin- 
gle state (two singleton ensembles) and the two states are 
diagonal in the same basis. In both limits, the distance 
D K (Q,P) reduces to the Kolmogorov distance between 
classical distributions. 



C. Properties of the Kantorovich fidelity 

The following properties of the Kantorovich fidelity 
(|13p can be proven similarly to the corresponding proper- 
ties of the Kantorovich distance, which is why we present 
them without proof. 

Property 1 (Positivity and normalization). 

0<F K {P,Q)<1, (54) 

with 

F K (P, Q) = 1 iff P(p) = Q(p), Vp e n, (55) 

and 

F K (P,Q) = (56) 

if and only if the supports of P and Q are orthogonal sets 
of states. 

Property 2 (Symmetry). 

F K (P,Q) =F K (Q,P), (57) 
V P,QeV n . 

Property 3 (Joint concavity). 

F K (pP 1 + {l- P )P 2 ,pQ 1 + (l-p)Q 2 ) (58) 
> pF K (P 1 ,Q 1 ) + (1 - P )F K (P 2 , Q 2 ), 
VPi,P 2 ,Qi,Q 2 eni, V P e [0,1]. 

Property 4 (Monotonicity under CPTP maps). 

For all CPTP maps £, 

F K (P, Q) < F K (M £ (P), M £ (Q)), (59) 

where Ms ■ Vq —>■ Vn £ is the map induced by £ . 

Corollary (Invariance under unitary maps). For 

all unitary maps hi, 

F K {P,Q) = F K (M U (P),M U (Q)), (60) 

where My : Vq — > Vq u is the map induced by U. 

Property 5 (Monotonicity under averaging). 

Let P denote the singleton ensemble consisting of the 

average state of P(p), p P = P{p)P- Then 

pen 

F K (P,Q) <F K (P,Q). (61) 



Corollary. If two distributions are close, their average 
states are also close, i.e., 

if F K (P,Q) > I- a, then F{p P ,p Q ) > 1-e. (62) 

Property 6 (Stability). Let P{p) and Q(p) be two 
ensembles of states in f2 and R(u') be an ensemble of 
states in fl' . Then, 

F K (P ® R,Q <g> R) = F K \P,Q). (63) 

Property 7 (Linear programming). The task of 
finding the optimal II(p, er) in Eq. (|13[) is a linear program 
and can be solved efficiently in the cardinality of f2. 

Limiting case 1 (Two singleton ensembles). If 

P(p) = S pT , p,r e il and Q(p) = S pcr , p,<r E £1, i.e., each 
of the ensembles P and Q consists of only a single state, 
then the fidelity between the ensembles is equal to the 
fidelity between the respective states, 

F k (PQ)=F(t,<j). (64) 

Limiting case 2 (One singleton ensemble). If the 

ensemble Q(p) consists of only one state <r, i.e., Q(p) = 
Sp a , p,a E ft, then the fidelity between P(p) and Q{p) is 
equal to the average fidelity between a state drawn from 
the ensemble P(p) and the state a, 

F k (P,Q) = J2p(pMp,*)- (65) 
pen 

Limiting case 3 (Classical distributions). If the 

set fl consists of perfectly distinguishable density matri- 
ces, i.e., F(p, a) = S pa , Vp, a S f2, then F K (P, Q) reduces 
to the following overlap between the classical probability 
distributions over the set f2: 

F K (P,Q) = 5>in(PO),Q(p)) = l4E™-^)l- 
pen pen 

(66) 

Comment. As pointed out earlier, there are two lim- 
its which can be interpreted as corresponding to classi- 
cal probability distributions — Limiting case 3 (probabil- 
ity distributions over a set of orthogonal states), and the 
limit of two singleton ensembles where the two states 
are diagonal in the same basis. Here, these two lim- 
its yield different results. In the first case, we obtain 
Eq. (|66p which is a particular type of overlap between 
classical probability distributions. In the second case, if 
P(p) and Q(p) are the spectra of the two density ma- 
trices, t he fidelity r educes to the Bhattacharyya over- 
lap ^ \J P(p), Q(p) which upper bounds expression (|66p . 
pen 

This reflects the fact that the way F K treats the overlap 
between the 'classical aspect' of the probability distribu- 
tion P(p) is not a special case of the way it treats the 
overlap between two quantum states. We will show in 
subsection E, that this property is intimately related to 
the fact that F K is not monotonic under measurements. 
The fidelity which we propose in Sec. V is monotonic un- 
der measurements and both its classical limits coincide. 
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D. Operational interpretations of the Kantorovich 
measures 

To further develop our understanding of the meaning 
of the Kantorovich measures, it is useful to illustrate their 
interpretation in the spirit of game theory. Let us con- 
sider the Kantorovich distance first. 

The trace distance is related to the maximum average 
probability p max (p, with which two equally probable 
states p and a can be distinguished by a measurement as 
follows: p m ax(P) o) = \ + \ || P — a || @ ■ This naturally 
suggests the following game scenario. Imagine that Alice 
has access to two ensembles of quantum states P(p) and 
Q(p), p S fi. More precisely, we will assume that she 
has at her disposal two sufficiently large pools of states 
in which the relative frequencies of different states are 
approximately equal to the corresponding probabilities 
for these states within a satisfactory precision. Alice has 
to pick one state from one pool and another state from 
the other pool and choose randomly (with equal proba- 
bility) whether to send the first state to Bob and throw 
the other away or vice versa. She has to tell Bob which is 
the pair of states drawn from the two ensembles. Bob's 
task is to distinguish, by performing any operation on 
the received state, from which ensemble the state he re- 
ceives has been drawn. This is repeated until the two 
pools are depleted (the two pools are assumed to have 
equal numbers of states). Bob's success is measured in 
terms of the average number of times he guesses correctly 
the ensemble from which the state he receives has been 
drawn. Alice's goal, on the other hand, is to choose the 
pairs of states from the two ensembles in such a way as 
to make Bob's task as difficult as possible. 

If every time Bob employs the optimal measurement 
strategy for distinguishing which state he has been sent, 
it is obvious that the optimal strategy for Alice is to pair 
the states according to the joint probability distribution 
n(er, p) which minimizes the right-hand side of Eq. (TT2)) . 
that is, minimizes the average probability of correctly 
distinguishing the two states in each pair by an optimal 
measurement. The Kantorovich distance can then be un- 
derstood as 

D K (P,Q)=2p^(P,Q)-l, (67) 

where p^^(P, Q) is Bob's maximal probability of success 
when Alice chooses her strategy optimally. 

The fidelity F K (P,Q) can be given a similar opera- 
tional interpretation, although a bit more artificial. The 
difference is that Bob's task and corresponding measure 
of success have to be chosen so that they are given by 
the fidelity between the two states which Bob wants to 
distinguish at every round. For this purpose, we can use 
Fuchs' operational interpretation of the fidelity [l2j as the 
minimum Bhattacharyya overlap between the statistical 
distributions generated by all possible measurements on 



the states, 

F(t, v) = min VVTr^^VTr^u), (68) 

where minimum is taken over all positive operators {Ei} 
that form a positive operator- valued measure (Y]Ej = I). 

i 

Then we can modify the game as follows. After sending 
one of the two states to Bob, Alice does not throw away 
the other state, but waits for Bob to tell her the type 
of measurement he performs on his state, and she per- 
forms the same measurement on her state. They record 
their results under many repetitions, and at the end they 
calculate the average of the statistical overlap between 
the resulting distributions of measurement outcomes for 
every pair of states. Bob's task is to minimize this quan- 
tity by appropriately choosing his measurements for ev- 
ery pair of states, while Alice's goal is again to make 
Bob's task as difficult as possible by choosing the pairs 
of states in a suitable manner. 



E. Non-monotonicity under generalized 
measurements 

The trace distance and the fidelity (as well as all 
fidelity-based distance measures between states) are 
monotonic under CPTP maps 0, [2(1 H2|. This prop- 
erty, also known as contractivity, can be understood as 
an expression of the fact that the distinguishability be- 
tween states described by these measures cannot be in- 
creased by performing any operation on the states. One 
may wonder if, when going to the realm of ensembles, 
we should expect a measure of distinguishability between 
ensembles to be monotonic under the more general class 
of stochastic operations, i.e., generalized measurements. 
After all, these are operations that transform ensembles 
into ensembles. We will show that this is not satisfied 
by the Kantorovich distance and fidelity. We will also 
relate this property to the fact that the Kantorovich fi- 
delity yields two different results in the two 'classical' 
limits since a necessary condition for a Kantorovich mea- 
sure to be monotonic under measurements is that both 
its classical limits are the same. This condition, however, 
is not sufficient, as shown by the case of the Kantorovich 
distance. 

Note, however, that our definitions of the Kantorovich 
measures were based on the trace distance and the square 
root fidelity. In an analogous manner, one can define 
Kantorovich measures based on any other distance or fi- 
delity between states. Non-monotonicity under general- 
ized measurements is not a problem per se and we will sec 
that there is no reason why we should expect it, consider- 
ing the operational meaning of the Kantorovich measures 
based on the trace distance and the square root fidelity. 
Nevertheless, it would be useful to have measures such 
that the distinguishability between ensembles that they 
describe cannot be increased by any possible operation 
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(see Sec. V). Driven by this motivation, we derive neces- 
sary and sufficient conditions that a measure of distance 
or fidelity between states has to satisfy in order for the 
corresponding Kantorovich measure to be monotonic un- 
der measurements. 

Let us first formulate precisely what we mean by mono- 
tonicity under generalized measurements. As pointed out 
earlier, under the most general type of quantum measure- 
ment, the state of a system transforms as in Eq. ([3]). 

Definition 3 (Monotonicity under generalized 
measurements). Consider a measurement M with 
measurement superoperators {Mi}. Denote the set of 
distinct density matrices among all possible outcomes 
Ttm[ P {p) over au P oss ible inputs p g SI by SIm- If we ap- 
ply the same generalized measurement (J3]) to every state 
in an ensemble P(p), p € SI, we obtain another ensemble 
P'{p'), p' £ S1 M . Thus the generalized measurement (O 
induces a map from the set of probability distributions 
over SI to the set of probability distributions over SIm- 
Denote this map by M : Vq — > Vn M ■ When we say that 
a distance function D(Q, P) between ensembles of states 
Q and P is monotonically decreasing (or simply mono- 
tonic) under generalized measurements, we mean that for 
any generalized measurement ([3]), 

D(M(P),M(Q))<D(P,Q), (69) 

where M : Vn — > Vn M is the map induced by the mea- 
surement. Similarly, a monotonicity of a fidelity F(Q, P) 
means 

F(M(P),M(Q))>F(P,Q) (70) 

for any generalized measurement. 

Property. The Kantorovich distance based on the 
trace distance (Eq. (|12p) and the Kantorovich fidelity 
based on the square root fidelity (Eq. (fT3]0 are not mono- 
tonic under generalized measurements. 

Proof. The proof is presented in Appendix B. 

The lack of monotonicity of the Kantorovich measures 
is something that should not be surprising considering 
the operational interpretations we discussed in the pre- 
vious subsection. Generally, monotonicity under certain 
types of operations means that the type of distinguisha- 
bility described by the measures cannot be increased un- 
der these operations. However, from the above game sce- 
narios we see that the distinguishability concerns Bob's 
ability do distinguish which of a pair of states Alice has 
sent to him, in the case where Alice has chosen the way 
she pairs the states in an optimal way. Certainly, by 
applying a measurement on the state he receives, Bob 
cannot improve his chances of guessing correctly beyond 
what he would obtain by doing the optimal measurement. 
However, the question of monotonicity we are asking con- 
cerns applying the same measurement to all states in the 
original ensembles before Alice has chosen her optimal 
strategy. There is no reason to expect that after apply- 
ing a measurement on all of the states in the original 
ensembles, the optimal strategy that Alice can employ 



for the resulting ensembles can only be better than her 
optimal strategy for the original ensembles. Indeed, as 
shown in Appendix B, this is not the case when the figure 
of merit is based on the trace distance or the square root 
fidelity. 

We now provide necessary and sufficient conditions 
that a measure of distance or fidelity between states has 
to satisfy in order for the Kantorovich measure based 
on it to be monotonic under measurements. We will de- 
note by the Kantorovich distance based on a distance 
d{p, a) between states, which is defined as in Eq. ([12")) 
with d in the place of A. Similarly, by F^ we will de- 
note the Kantorovich fidelity based on a fidelity f(p,a) 
between states. 

Theorem 2 (Conditions for monotonicity of the 
Kantorovich measures under generalized mea- 
surements). Let d(p,a) and f(p,a) be normalized 
distance and fidelity between states, which are mono- 
tonic under CPTP maps and jointly convex (concave). 
The Kantorovich distance Df(P, Q) or fidelity Ff(P, Q) 
based on d{p,a) and f(p,a), respectively, is monotonic 
under generalized measurements if and only if for every 
two states of the form Y^iPiPi® an d Si 1i a i ® 
where {\i)} is an orthonormal set of states, the distance 
and fidelity satisfy 

i i 

= ^2 (min(p l ,q l )A(p l ,a t ) + -\pi - qi\j , (71) 

i ^ - / 

and 

fC^PiPi ® \i){i\,^2<li^i ® [*><*!) 

i i 

= ^nun(p i ,ft)F0°i ) <7*), ( 72 ) 

i 

respectively. 

Proof. The proof is presented in Appendix C. 

Comment 1. This theorem is a statement regard- 
ing the relation between the values of a given measure 
(distance or fidelity) between states over Hilbert spaces 
of different dimensions. Note that if a measure has a 
well-defined operational interpretation formulated with- 
out reference to the dimension of the Hilbert space (to 
the best of our knowledge, this is the case for all known 
measures of distance and fidelity between states), that 
measure is automatically defined for any dimension. The 
property of monotonicity that we are interested in is also 
dimension- independent. We remark that the above the- 
orem concerns distance and fidelity measures between 
states which are monotonic under CPTP maps without 
the restriction that the CPTP maps preserve the dimen- 
sion of the Hilbert space since we are interested in proving 
monotonicity under the most general type of quantum 
operations. One can easily see that monotonicity under 
CPTP maps that can increase the dimension is equiva- 
lent to monotonicity under dimension-preserving CPTP 
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maps plus the stability condition d(p, a) = d(p®n, <j®k) 
and f{p,a) = f(p <£> K, a ® K) for all p, a e B(H) and 
k € B(Tt') where H and are arbitrary Hilbert spaces. 
Similarly, monotonicity under CPTP maps that can de- 
crease the dimension is equivalent to monotonicity under 
dimension-preserving CPTP maps plus monotonicity un- 
der partial tracing. 

Comment 2. The third Jozsa axiom states that a 
fidelity function should satisfy [3?J 

/(p,M<V|) = MpM. (73) 

The square root fidelity we have considered above satis- 
fies a modified version of that axiom, namely, 

F{PAM4\) = V¥\PW)- ( 74 ) 

But one can see that if the fidelity / satisfies Eq. (|T2j) , it 
must satisfy 

fC£,PjpJ®\Mj\,\i>)W®\i)(i\) 

3 

= Pif(Pi®\i)(i\,tt>)(iP\®\i)(i\), (75) 

which can be only consistent with Eq. (I73|) and not with 
Eq. (|7i|) . This rules out a class of possible fidelity func- 
tions. 

A natural question to ask is whether there actually ex- 
ist measures of distance or fidelity between states that 
satisfy the conditions of the theorem and thereby would 
give rise to Kantorovich measures that are monotonic 
under generalized measurements. We leave this problem 
open for future investigation. Instead, in the next sec- 
tion we propose distance and fidelity between ensembles 
which are based on the trace distance and the square root 
fidelity but are not of the Kantorovich type and satisfy 
the desired monotonicity. 

V. DISTANCE AND FIDELITY BASED ON THE 
EXTENDED-HILBERT-SPACE 
REPRESENTATION OF ENSEMBLES 

A. Motivating the definitions 

In this section, we adopt a different approach to 
defining measures between ensembles of quantum states, 
which is based on the extended-Hilbert-space (EHS) rep- 
resentation of ensembles that we briefly touched upon in 
Sec. II. As we pointed out, an ensemble describes states 
occurring randomly according to some probability distri- 
bution, but an indispensable part of the ensemble is the 
classical side information about the identity of the given 
state. The idea behind the EHS representation is that 
the classical system storing that information is ultimately 
quantum and therefore it must be possible to describe it 
in the language of quantum mechanics. In the original 
formulation of the EHS representation fl9| , an ensemble 
of the form {(p x ,Px)} is represented in terms of a state 



of the form p = J2 x p x Px ® [x] (Eq. (8J). When only a 
single ensemble is involved, this representation is suffi- 
cient and it is not important what the pointer (or flag) 
states [x] = \x)(x\ are, as long as they form an orthonor- 
mal set and each [x] is unambiguously associated with 
p x . However, if we want to use the EHS idea to compare 
two ensembles, we need to go beyond this simple formu- 
lation. In Sec. Ill, we already saw one example where 
a naive application of this idea fails. Namely, we argued 
that if we represent two ensembles P(p) and Q(p), p € Q, 

by the states J2 pe n P (p)p ® \p\ and J2 P en Q(p)P ® [p], a 
distance or fidelity between these EHS representations is 
equivalent to a distance or fidelity between the probabil- 
ity distributions P(p) and Q(p) in which p is treated as 
a classical variable. Such a measure does not capture the 
idea of closeness between different quantum states. In 
this section, we will provide a generalized formulation of 
an EHS representation of an ensemble, which will allow 
us to define measures of distance and fidelity between 
ensembles that possess all properties that we would like 
such measures to have. 

For this purpose, it is convenient to introduce the no- 
tion of a 'classical' system whose states live in a 'classical' 
space which we define to be a fixed set Sl c of orthogonal 
pure states [c], Tr( [c][c']) = 5 CC /, where we use the nota- 
tion [c] = \c)(c\ to distinguish the states of the 'classical' 
system from the states of the quantum system. Gener- 
ally, the classical space can consist of infinitely many dif- 
ferent states, but later we will see that it suffices to con- 
sider a classical space of cardinality = |f2| 2 , where 
|fi| is the cardinality of the set O of density matrices 
participating in the ensembles. 

Given the classical system described by the classical 
space f2 and a set f2 of states of a quantum system, we 
can ask what are the most general states of the quantum- 
classical system that represent an ensemble P(p), p € f2, 
consistently with our notion of ensemble. As we pointed 
out, the information about the identity of a quantum 
state from the ensemble must be stored in the classi- 
cal system in a way which allows one to unambiguously 
identify the state by measuring the state of the classical 
system. If we take this to be the definition of a valid EHS 
representation, then we should allow for the possibility 
that several flag states {[c,- (/?)]} point at the same quan- 
tum state as long as every flag state is associated with a 
single quantum state and, of course, each quantum state 
p still appears with the correct total probability. More 
succinctly, the most general EHS representation should 
allow for mixed flag states, i.e., 

pp = E p (p)p ® ( E p* (?) & (p)] ) • ( 76 ) 

pen V i ) 

Having a quantum-classical state of this form is equiv- 
alent to having the ensemble {(P(p), p)} because by mea- 
suring the state of the classical system, we can infer which 
state from the ensemble we are given, and given a state 
drawn randomly from the ensemble we can always pre- 



14 



pare the state (|76|) by attaching the corresponding classi- 
cal state and discarding any additional information. Note 
that in the expression (|76p we have written the classi- 
cal states as [cj(p)], explicitly indicating which classical 
states are associated with the quantum state p, but it 
is convenient to express the condition that every pointer 
state is associated with a unique p £ as a condition on 
a general state of the quantum-classical system. 

Definition 4 (EHS representation of an ensem- 
ble). An EHS representation of an ensemble P(p), 
p € f2, is a quantum-classical state of the form 

p = E E HpM)p®[c], (77) 

pen [ c ]en c 

for which the non-negative quantities P(p, [c]) satisfy 

£ P(PM=P(P), Vpefi, (78) 
[c]efic 

P(p,[c])P(a,[c])=0, Vp 7 <jen\p^a, V[c]e!l c . 

(79) 

Equation ([75)1 ensures that every quantum state p G f2 
occurs with the correct probability P(p) and Eq. (|T9")) ex- 
presses the fact that a given pointer state [c] in VtP cannot 
be associated with more than one state in Q. In other 
words, there exists an injective function £ : f2 — > SI 1 - 7 
which specifies the pointer states associated with a given 
p G CI, and P(p, [c]) = if C _1 ([c]) 7^ P- It is impor- 
tant to note that a given ensemble can be encoded us- 
ing many different injections. If two ensembles P and 
Q are encoded using injections (p and £q which map 
the space f2 to two non-overlapping subsets of Cl c , the 
corresponding EHS representations of the two ensembles 
would be completely orthogonal and therefore perfectly 
distinguishable. However, if the sets of quantum states 
participating in the two ensembles are not orthogonal, 
one can always chose two EHS representations of the two 
ensembles which have a non-zero overlap because one can 
assign one and the same pointer to two non-overlapping 
states from the two ensembles. At the same time, unless 
the two ensembles are identical, their EHS representa- 
tions cannot be made identical. This suggests a way of 
defining distance and fidelity between ensembles based 
on an optimal choice of their EHS representations. 

Definition 5 (EHS distance between ensembles). 
The EHS distance between the ensembles P(p) and Q(p), 
p € ft, is 

P> EHS (P, Q) = min A(p, a), (80) 

p,CT 

where A is the trace distance (Eq. ([2])), and minimum is 
taken over all EHS representations p and a of P(p) and 
Q(p), respectively. 

Definition 6 (EHS fidelity between ensembles). 

The EHS fidelity between the ensembles P(p) and Q(p), 
p G ft, is 

P EHS (P, Q) = max F(p,(7), (81) 

p,cr 



where F is the square root fidelity (Eq. ((T|)), and max- 
imum is taken over all EHS representations p and a of 
P{p) and Q(p), respectively. 

Before we proceed with studying the properties of these 
measures, it is convenient to present two equivalent for- 
mulations of the above definitions. 

Lemma 2 (Equivalent form of the EHS dis- 
tance). The EHS distance (f8T)]) is equivalent to 

D EHS (P 1 Q)= (82) 

min A( V P(p, a)p <g> [per], V Q(p, a)a <g> [per]), 

P(p,<7), ' ' 

^, " p.o-en p.o-efi 

Q(p,<r) 

where minimum is taken over pairs of joint probabil- 
ity distributions P(p,a) and Q(p 1 a) such that the left 
marginal of P(p, cr) is equal to P(p) and the right 
marginal of Q(p, a) is equal to Q(er). The set of pointer 
states [per] is fixed and has cardinality equal to the square 
of the cardinality of SI. 

Proof. First, observe that for any two EHS represen- 
tations p and (7 of P and Q, the distance A(p, a) has the 
form 

A(p,a)= (83) 

A (£ E nMc])p®[c],£ £ Q(p,[c])p«3[c]), 
pen[ c ]efic pefi[ c ] S fic 

where P(p, [c]) and Q{p, [c]) are consistent with Defini- 
tion 4. It can generally happen that one and the same 
pointer [c] is attached to a state p from the first ensem- 
ble and to a state a from the second ensemble, that 
is, P(p, [c]) ^ and Q(er, [c]) 7^ 0. However, having a 
pair of states p and a from the first and second ensem- 
bles, respectively, attached simultaneously to more than 
one pointer, does not help in attaining the minimum in 
Eq. (|80|) . This follows from the fact that we could replace 
the second pointer by the first one, which would result 
in valid EHS representations of the two ensembles. But 
the latter operation also corresponds to a CPTP map 
on the states in the extended Hubert space, and since 
A is monotonic under CPTP maps, the resultant rep- 
resentations will be closer. Therefore, without loss of 
generality, we can assume that every pair of states p and 
cr from the first and second ensemble, respectively, is as- 
sociated with a single pointer state, which we will label 
by [per]. This implies that the minimum in Eq. (|80[) can 
be taken over EHS representations of P and Q of the 
form J2„,aen P (P : a )P® IH and £p )? - e n Q(p, a)a®[pa}, 
where the condition of consistency with the original dis- 
tributions P and Q amounts to conditions on the left and 
right marginals of P(p,a) and Q(p, cr), respectively: 

£P(p,cr)=P(p), (84) 
a 

£Q(p,cr) = Q(cr). (85) 
p 

This completes the proof. 
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Lemma 3 (Equivalent form of the EHS fidelity). 

The EHS fidelity flST]) is equivalent to 



with equality 



F EHS (P,Q) = 



(86) 



P{p,a),Q(p.a) 



p,cr£0 



where minimum is taken over pairs of joint probabil- 
ity distributions P(p,a) and Q(p,a) such that the left 
marginal of P(p, cr) is equal to P{p) and the right 
marginal of Q{p, cr) is equal to Q(cr). The set of pointer 
states [per] is fixed and has cardinality equal to the square 
of the cardinality of f2. 

Proof. The proof is analogous to the proof of Lemma 

2. 

Corollary (Formulation without reference to an 
extended Hilbert space). Considering the explicit 
forms of the trace distance and the square root fidelity, 
one can see that Eqs. ([52")) and ([56]) can be written with- 
out reference to the classical pointer system: 



D EHS (P,Q) 



1 



\ " 



2 P(p,a),Q(p,a) ^ 



F EHS (P,Q) = max V ^P(p,a)Q(p,a)F(p,a), 

P(p,a),Q(p,cr) <—? 



where optimization is taken over all joint distributions 
P(p, cr) with left marginal P(p) and Q(p, cr) with right 
marginal Q(cr). 



B. Properties of the EHS distance 



Property 1 (Positivity). 



D EHS (P,Q)>0, 
yP,QeV n , 



(89) 



with equality 

D EHS (P, Q) = iff P(p) = Q(p), Vp e fi (90) 

Proof. The EHS distance is obviously non-negative 
since A(p, cr) > 0. If both ensembles are the same, 
P(p) = Q( j0 ), \/ p G fi : clearly D EHS (P,Q) = 0, because 
we can choose identical EHS representations for both en- 
sembles. Reversely, if D EliS (P,Q) — 0, this means that 
the EHS representations of P and Q must be identical, 
which means that P and Q must be the same. 

Property 2 (Normalization). 



D EHS (P,Q) < 1, 
V P,QeVn, 



(91) 



D EHS (P,Q) = 1 



(92) 



max > F( P(p,a)p ® [pa], E Q(p,a)a (3 [pa]), 



P{p,v)p-Q{p,v)v II, 
(87) 



if and only if the supports of P and Q are orthogonal sets 
of states. 

Proof. Since A(p,a) < 1, obviously D EHS (P,Q) < 1. 
If P and Q have supports on orthogonal sets of states, 
then all of their EHS representations will also be or- 
thogonal, which implies D EriS (P,Q) = 1. Reversely, if 
D EHS (P,Q) = 1, this means that the EHS states for 
which the minimum in Eq. (|80[) is achieved, must be 
orthogonal. But unless P and Q have supports on or- 
thogonal sets of states, it is always possible to find EHS 
representations of P and Q which have non-zero overlap 
because we can assign one and the same pointer to two 
non-overlapping states from the two different ensembles. 

Property 3 (Symmetry). 



D EHS (P,Q) =D EHS (Q,P), 
V PQe Tu- 



rn) 



Proof. The symmetry follows from the definition (fT2|) 
and the symmetry of A(p, cr). 

Property 4 (Triangle inequality). 

D EHS (P, R) < D EHS (P, Q) + D EHS (Q, R), (94) 
VP,Q,ReVn. (95) 

Proof. The proof is presented in Appendix D. 
Property 5 (Joint convexity). 

D EHS ( P P 1 + {l-p)P 2 ,pQ 1 + (l-p)Q 2 ) (96) 
< pD ESS (P 1 ,Q 1 ) + (1 - p)D EHS (P 2 , Q 2 ), 
VPi,P 2 ,Qi,Q 2 en2, V P e [0,1]. 

Proof. Let 

D EHS (P 1 ,Q 1 ) = 
A( E Pi(/0, <x)p®[pa], E Qi{p^)a®[pa]) (97) 

and 

D EliS (P 2 ,Q 2 ) = 
A( E P 2 (p,(t)p® [pa], ct)ct <E> [per]), (98) 

p,(7£fl p,£r£Q 

where the joint distributions Pi(p, cr) and P 2 (p,a) have 
left marginals P\{p) and P 2 (p), respectively, and the joint 
distributions Qi(p,cr) and Q 2 (p,cr) have right marginals 
Qi(cr) and Q 2 (a), respectively. Since A is jointly convex, 
we have 

p£ EHS (P 1; + (1 - P )D EHS (P 2l Q 2 ) > 
A( £ (pPi(p,a) + (l-p)P 2 (p,a))p^[pa], 

E (pQi(p,cr) + (l-p)Q 2 (p,cr))a® [pa]). (99) 
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But obviously pPi(p, a) + (1 — p)P2(p, a) is a joint dis- 
tribution with left marginal pP\{p) + (1 — p)P2(p), and 
pQi(p,a) + (1 — p)Q2{p,a) is a joint distribution with 
right marginal pQi(a) + (1 — p)Q2{<r)- Therefore, the 
quantity on the right-hand side of Eq. (|99|) is greater 
than or equal to D EKS {pP 1 + (l-p)P 2 ,pQx + (l-p)Q 2 ), 
which completes the proof. 

Property 6 (Monotonicity under generalized 
measurements). D EliS (P,Q) is monotonic under gen- 
eralized measurements in the sense of Definition 3, 
D EHS (P,Q) > D EHS (M(P),M(Q)). 

Proof. Let P{p) and Q(p), p <E H be two ensembles of 
quantum states, and let 

D EHS (P,Q) = 

A(J2 P(p,<T)p®[p<r}, E Q(p,<r)tT»\p&\). (100) 

Let {Mi}, Mi(p) — Y^j MijpMjj, be the measure- 
ment superoperators of a generalized measurement M, 
£\ . M[^M l2 = I. Consider the following CPTP map: 

M(p)->X;M(p)<8[i], (101) 

i 

where {[i]} is an orthonormal set of pure states in the 
Hilbert space of some additional system. Since A is 
monotonic under CPTP maps, we have 

D EHS (P,Q) > 
= A( E T, P M^(M i {p)) r ^ f f ® [pai], 

>D EHS (M(P) 1 M(Q)), (102) 

where M : Vn — > "^Om is the map in- 
duced by the measurement as explained in Defini- 
tion 3. The last inequality follows from the fact 

t^t Z Pltrea Ei PiP, <WMiO0) ttTmSp)) ® M and 

E P ,. e n Ei Q(P, *mMi(v)) ^$1%) (8 are EHS 

representations of the new ensembles M(P) and M(Q). 

Corollary (Monotonicity under CPTP maps 
and invar iance under unitary maps). Property 6 ob- 
viously implies monotonicity under CPTP maps, which 
can be regarded as a special type of generalized mea- 
surements. This in turn implies invariance under unitary 
maps since the latter are reversible CPTP maps. 

Property 7 (Monotonicity under averaging). 
Let P denote the singleton ensemble consisting of the 

average state of P(p), Pp = E P(p)P- Then 

pen 

D EHS {P,Q) > D EHS (P,Q). (103) 



Proof. Let 

£ EHS (P,Q) = 
A( E P{p,<r)p®\pv], E QM*®\p<r])- (104) 

Observe that 

Pp = Tt c ( E P{p,cj)p®[pa\) (105) 

and 

~p Q = Tr c ( E Q(P, <r)v ® H)> (106) 

where Trc denotes partial tracing over the subsystem 
containing the classical pointers {[per]}. On the other 
hand, A(p,a) = D EHS (P,Q) (see Eq. UTTS) below). Since 
A(/3, cr) is monotonic under partial tracing (which is a 
CPTP map), the property follows. 

Corollary. If two distributions are close, their average 
states are also close, i.e., 

if D EHS (P,Q) < e, then A(p P ,p Q ) < e. (107) 

Property 8 (Continuity of the average of a con- 
tinuous function). Let h(p) be a bounded function, 
which is continuous with respect to the distance A. Then 

the ensemble average of h(p), hp — P(p)Hp): is COn- 
pen 

tinuous with respect to D EHS . 

Proof. The proof is presented in Appendix E. 

Comment. Again, as we pointed out in relation to the 
Kantorovich distance, Property 8 naturally reflects the 
idea of states as resources — if a resource is a continuous 
function of the state, when two ensembles are close, their 
average resources must also be close. 

Property 9 (The EHS distance is upper 
bounded by the Kantorovich distance). 

D EliS (P,Q) <D K (P,Q). (108) 

Proof. Let H(p, a) be a joint probability distribution 
with left and right marginals P(p) and Q(cr) for which the 
minimum m the definition (TT2]) oiD K {P,Q) is attained. 
Obviously, the minimum in Eq. (|82p satisfies 

D EHS (P,Q) < 

A( E n(p,(r)p®[H, E n(p,a)a® [H) 

= E li(p,<r)A(p,a)=D K (P,Q). (109) 

Property 10 (Stability). Let P(p) and Q(p) be two 
ensembles of states in il and R{<J r ) be an ensemble of 
states in Of , where f2 and f2' are sets of states of two 
different systems. Then, 

D EUS {P®R,Q®R) = D EiiS {P 1 Q). (110) 
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Proof. Let 

D EHS (P®R,Q®R) = 
A( Y Y n(p®r',£7(8iK/)/j® r '® [prW], 

p,o-6f2 t',k'GO' 

X] J(p<8>t',<t® K,')a® k' ® [prW]), (111) 

p^eO. t' .k'^Q' 

where H(p eg) t', cr ® k') has left marginal P(p)R(t') and 
J(p ® r', cr ® k') has right marginal Q(<t)R(t'). 

One can readily see that the monotonicity of A under 
partial tracing implies 

D EliS {P®R,Q®R)>D EHS (P,Q). (112) 

Using the stability of A, we see that if we choose 
Ii{p®T' , cr®n') = P{p, (t)R(t')S t > k > and J{p®r' ', ct®k') = 
Q(P,o-)R(t')S t > k i, where P(p,cr) and Q(p,a) are two 
joint distributions for which the minimum in Eq. (|82p 
is attained, we obtain 

D EKS (P®R,Q®R)<D EliS (P,Q), (113) 

which together with Eq. (I112p implies Eq. (jllOp . 

This property can also be seen to follow from Property 
6 because one can go from P and Q to P £g> R and Q(g)R, 
respectively, and vice versa, via stochastic operations. 

Property 11 (Convex optimization). The task of 
finding the optimal P(p,cr) and Q(p,cr) in Eq. (j52")) is a 
convex optimization problem. 

Proof. We can think of P(p,a) and Q(p,a) as the 
components of a vector x of dimension 27V 2 , where N is 
the cardinality of the set f2. The first iV 2 components 
of the vector are equal to P(p, cr) and the second iV 2 
components are equal to Q(p,cr). The convexity of the 
function 

e(ar) = A( Y, P(p,<r)p®[p°}, Y Q(p, a) a® [pa]) 

(114) 

can be seen from the fact that for any x\, X2, and t, 
< t < 1, we have 

f (tai + (1 - t)x 2 ) = 
A( Y, {tPi{p,cr) + {l-t)P 2 {p,cx))p®[pa], 

Y (tQi(p, a) + (i - *)<32(p, <t))<7 ® [H) < 

(l-t)A(^ P 2 (p,o-)p® [pa], Y QaO.^VoM) 

= <e(x 1 ) + (l-tK(a; 2 ), (115) 

due to the joint convexity of A. Notice that if Pi(p, cr) 
and P2{p,a) have left marginals equal to P(p), so does 



tP\(p,a) + (1 — t)P2(p,(j). Similarly, if Qi(p,cr) and 
Q2{p,cr) have right marginals equal to Q(p), so does 
tQi(p, cr) + (l~t)Q 2 {p, cr). Since the marginal conditions 
on x are linear, the problem of finding x which minimizes 
£ (x) subject to these constraints is a convex optimization 
problem, for which efficient numerical techniques exist. 

Limiting case 1 (Two singleton ensembles). If 
P(p) — S pT , p,r £ and Q(p) — 8 pa , p,cr e ft, i.e., each 
of the ensembles P and Q consists of only a single state, 
then the distance between the ensembles is equal to the 
distance between the respective states, 

D EHS (P,Q) = A(r,a). (116) 

Proof. Due to the monotonicity of A under partial 
tracing over the pointer system, we have that D(P, Q) > 
A(t, a). But clearly, equality is achievable because 
we can choose the probability distributions in Eq. (|82|) 
P{n,p) = Q(n,p) = S KT S pc7 . 

Limiting case 2 (One singleton ensemble). Un- 
like the Kantorovich distance, when the ensemble Q(p) 
consists of only one state a, i.e., Q{p) — S pc7 , p, a € O, 
the EHS distance between P(p) and Q(p) is generally not 
equal to the average distance between a state drawn from 
the ensemble P{p) and the state a, 

D Ens (p,Q)^Y p (p) A (p^y ( n? ) 

pen 

Proof. We provide a proof by counterexample. Let 
the singleton ensemble consist of the sate gq — a® |0)(0| 
and let the other ensemble consist of two states, po = 
Po ® |0)(0| and p\ — p\ ® with probabilities po and 

Pi = 1 — po, respectively. The average distance between 
the state cro and the states from the other ensemble is 

A a „ e = R>A(p , cr ) +piA(pi, cr ) 

= p A{p ,a )+ Pl . (118) 

However, if we choose the joint distributions P(p, cr) — 
P\5 PPl and Q(p, cr) — S Poag , we see from Eq. ([87]) 

that 

^ EHS (PQ)<i||poPO-cr || +\ Pl < 
y || PO - cr || +^(1 - Po) || cr || +ipi = 

Po-^ II Po ^ o'o II +Pi = A a „ e . (119) 

For an appropriate choice of po and do, the second in- 
equality can be made strict, which completes the proof. 
Limiting case 3 (Classical distributions). If the 

set fi consists of perfectly distinguishable density matri- 
ces, i.e., A(p,cr) = 1 - S pa , Vp,cr € fl, then D EHS {P,Q) 
reduces to the trace distance A(pp,Pq) between the den- 
sity matrices p P = ~£ p€ n p {p)p and Pq = E P en Q(p)P> 
which is equal to the Kolmogorov distance between the 
classical probability distributions P and Q, D EHS (P, Q) = 

hE\P(p)-Q(p)\- 

pen 
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Proof. The property follows from the fact that via 
CPTP maps one can go back and forth between any EHS 
representations of the ensembles P(p) and Q(p), p € CI, 
and the states ~p P and ~p~Q. 

C. Properties of the EHS fidelity 

The properties of the EHS fidelity |86|) can be proven 
analogously to the properties of the EHS distance, which 
is why we present them without proof. 

Property 1 (Positivity and normalization). 

<F ESS (P,Q) < 1, (120) 
VP,QePo, 

with 

F EHS (P, Q) = 1 iff P(p) = Q(p), Vp € fi, (121) 

and 

F EHS (P,Q)=0 (122) 

if and only if the supports of P and Q are orthogonal sets 
of states. 

Property 2 (Symmetry). 

F EHS (P,Q)=F EHS (Q,P), (123) 
V P,QeV n . 

Property 3 (Strong concavity). 

P EHS (pPi + (1 - p)P», qQi + (1 - ?)Q 2 ) (124) 
> VMP EHS (Pi,Qi) + ^{l-q){l-p)F™ s {P 2 ,Q 2 ), 

VPi,P 2 ,Qi,Q 2 ePo, Vp, g e [0,1]. 

Property 4 (Monotonicity under generalized 
measurements). F EliS (P,Q) is monotonic under gen- 
eralized measurements in the sense of Definition 3, 
F EHS (P,Q) < F EHS (M(P),M(Q)). 

Corollary (Monotonicity under CPTP maps 
and invariance under unitary maps). F EliS (P,Q) is 
monotonic under CPTP maps and invariant under uni- 
tary maps. 

Property 5 (Monotonicity under averaging). 

Let P denote the singleton ensemble consisting of the 

average state of P(p), Pp = P{p)P- Then 

pen 

F EHS (P,Q) <F EHS (P,Q). (125) 

Corollary. If two distributions are close, their average 
states are also close, i.e., 

if F EHS (P,Q) > 1 - e, then F(p P ,p Q ) > 1 - e. (126) 

Property 6 (The EHS fidelity is lower bounded 
by the Kantorovich fidelity). 

F EliS (P,Q)>F K (P,Q). (127) 



Property 7 (Stability). Let P(p) and Q(p) be two 
ensembles of states in CI and R(cr') be an ensemble of 
states in Cl' , where Cl and Cl' are sets of states of two 
different systems. Then, 

F EliS (P<g>R,Q<g>R) = F EliS {P,Q). (128) 

Property 8 (Convex optimization). The task of 
finding the optimal P(p,a) and Q{p,a) in Eq. (186p is a 
convex optimization problem. 

Limiting case 1 (Two singleton ensembles). Let 

P(p) = S pT , p, t E £1 and Q(p) = S pa , p,a € CI, i.e., each 
of the ensembles P and Q consists of only a single state. 
Then the fidelity between the ensembles is equal to the 
fidelity between the respective states, 

F EHS (P,Q) =F(T,a). (129) 

Limiting case 2 (One singleton ensemble). Un- 
like the Kantorovich fidelity, when the ensemble Q(p) 
consists of only one state a, i.e., Q{p) — 5 pa , p, u e 0, 
the EHS fidelity between P(p) and Q(p) is generally not 
equal to the average fidelity between a state drawn from 
the ensemble P{p) and the state a, 

F EHS (P,Q)^^P(p)P(p, ( 7). (130) 
pen 

Limiting case 3 (Classical distributions). If the 

set fl consists of perfectly distinguishable density ma- 
trices, i.e., F(p,a) = S pa , Vp,a G CI, then F EHS {P,Q) 
reduces to the fidelity F(j) P , Tjq) between the density ma- 
trices p P = J2 P en P (P)P and PQ = £« 6 n Q(P>P> which 
is equal to the Bhattacharyya overlap between the clas- 
sical probability distributions P and Q, F EHS (P,Q) — 

E Vp(pMp)- 

pen 

Comment. Unlike the Kantorovich fidelity, here both 
'classical' limits are the same. 



D. Operational interpretations of the EHS 
measures 

Similarly to the Kantorovich measures, we can under- 
stand the meaning of the EHS measures from an opera- 
tional point of view. However, we present an interpreta- 
tion in the spirit of Sec. IV.D only for the EHS distance. 
For the EHS fidelity, we present an interpretation of a 
different type, in which an ensemble of density matrices 
is looked upon as the output of a stochastic quantum 
channel with a pure-state input. 
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The EHS distance 



Observe that Eq. (|87|) can be written as 
D^(P,Q) = min V P(p,*) + QM . 

P(p.a),Q(p.a) ^ 2 



PM P~ , ^4 ro-H- (131) 



11 P(p,a)+Q(p,a)^ P(p,(7)+Q(p,a) 
It is not difficult to sec that 



" P(p, a) + Q(p, a) 9 ' P(p, a) + Q(p, a) " " 

= 2p max (p, ct) - 1, (132) 

where p max (a, p) is the maximum average probability 
with which the two states a and p, each occurring with 



Q(p, c) 



prior probability 



and 



respec- 



P(p,<r)+Q(p,a) P(p,<t)+Q(p,o±; 

tively, can be distinguished by a measurement [6j] . In the 
case when each of the states p and a is equally likely, the 
quantity (| 132|) reduces to | || p — a ||. 

Imagine that Alice is given two ensembles P(p) and 
Q(p), p € ft, which are also known to Bob. With prob- 
ability 1/2, she chooses one of the two ensembles and 
draws a random state from it. Let us say that she draws 
the state p from the first ensemble. She then sends this 
state to Bob but tells him that she is sending either the 
state p drawn from the first ensemble or the state a drawn 
from the second ensemble, where Alice can choose to say 
a particular a depending on the p she actually got. Bob's 
task is to distinguish from which ensemble the state he re- 
ceives has been drawn, and the figure of merit of his suc- 
cess is the average number of times he guesses correctly. 
Alice's goal is to make Bob's task as difficult as possible, 
with the caveat that, although she is free to choose her 
strategy, she has to reveal it to Bob. Alice's strategy is 
described by the probabilities T\{p\a) with which, when 
having drawn state p from the first ensemble, she will 
tell Bob that the state is either p from the first ensem- 
ble or a from the second ensemble, and the probabilities 
TMplcr) with which, when having drawn state a from the 
second ensemble, she will say that the sate is either a 
from the second ensemble or p from the first ensemble. 
In other words, Bob is aware of the joint probabilities 
P(p,a) = P{p)T x (p\a) and Q(p,a) = T 2 {p\a)Q(a). Ob- 
viously, the probability that Bob will be told that the 
state he receives is either p from the first ensemble or a 
from the second ensemble is equal to - p ( p ' g ")+Q( p '°') ; anc [ 
the prior probability that in such a case the state is p is 
p ^ p ^ while the prior probability that the state 

Then assuming that Bob performs 



P(p,a)+Q(p,a) 

• , Q(p.a-) 
1S ° 1S P(p,<j)+Q(p,a 

the optimal measurement to distinguish these states with 
these prior probabilities, the optimal strategy for Alice 
is to choose T\{p\a) and T2{p\a) (or equivalently, P(p,a) 
and Q(p, a)) that minimize the quantity (|131| . The EHS 
distance can then be understood as 



where Pm^(P, Q) is Bob's maximal probability of success 
when Alice chooses her strategy optimally. 



2. The EHS fidelity 

For the EHS fidelity, we propose an interpretation 
which is similar to the one proposed for the square root 
fidelity in Ref. [H, 



F(p,a) = max | | </>)|, 



(134) 



where maximization is taken over all pure states and 
\4>) such that p = £(\ip}(ip\) and a = E (\(j>}(4>\) for some 
CPTP map £ . According to this interpretation, if p and a 
are the outputs of a deterministic quantum channel with 
pure-state inputs, the square root fidelity is an upper 
bound on the overlap between the input states. It turns 
out that the EHS fidelity provides a generalization of this 
idea to stochastic quantum channels. 

When a generalized measurement M with measure- 
ment superoperators {Al;} is applied to a given state 
cr, it gives rise to an ensemble P(p), p E £1, with 
P(p) = a i=P Pn where pi = Tr(Mi (cr)) are the prob- 
abilities for the different measurement outcomes, and 
cr, = Mi{a)/pi are their corresponding output states. 
In other words, M can be viewed as a stochastic quan- 
tum channel which for a given input state outputs an 
ensemble of states. We will use the short-cut notation 
M(cr) to denote the ensemble of states resulting from the 
action of the channel M on the state a. 

Theorem 3 (Channel-based interpretation of 
the EHS fidelity). Let P(p) and Q{p), p eft, be two 
ensembles of density matrices on TL S . Then, 



5 (P,Q) = max|(V#)|, 



(135) 



where maximization is taken over all pure states S 
H s and |0) £ H s such that M^K^I) = {(P{p),p)}, 
pen, and M(\(/>)(d>\) = {(Q(p),p)}, p G ft, for some 
stochastic channel M. 

Proof. From the monotonicity of the EHS fidelity 
under generalized measurements it follows that for any 
generalized measurement M and two states and \4>), 



F EHS (M(|^|),M(|</,}(0|)) > 



(136) 



D™ s {P,Q) = 2p^{P,Q)-l 



(133) 



Therefore, we only have to show that there exist states 
|^), \<fi) G 7i s and a generalized measurement M, for 
which equality is attained. 

Let P(p,a) and Q(p,a) be two joint probability dis- 
tributions which achieve the maximum in Eq. (|88[) for 
the pair of probability distribution P(p) and Q(p). From 
Uhlmann's theorem [lj] we know that for any pair (p, a) G 
ft x ft, there exist purifications |i/ycr) 5B € Tt s ®H B and 
\4>p,<j) SB £ H s <g> 7i B of p and cr, respectively, such that 
F(p,a) — {i' Pl a\ ( t ) p,a) SB ■ The second system B can be 
chosen to have the same dimension as that of S. Let 
us introduce a third system with a Hilbert space TC E of 



20 



dimension N 2 , where N is the cardinality of the set ft. 
Let {\{p,cr)} E }, (p, c) € ft X ft, be an orthonormal basis 
of H E . From Eq. (JHEJ one can readily see that the pure 
states 

\P) SBE = £ v^P(^)|^) SB |(p,a)) £ , (137) 

|Q) SB£ = ^ v^^)!^) 55 !^^))^, (138) 

by construction satisfy 

(P\Q) SBE = F EHS (P,Q). (139) 

Notice that there exists a unitary transformation J7 € 
£(H S <8> H B ® H B ) such that 

C/|V') S |0) BB = |P) SBB , (140) 
£/|0) 5 |O) BB = |Q) 5BB , (141) 

where |0) BB is some state in 7i B <g> 7i E , and \4>} s and 
\cj)) s are states in TL S . Since unitary operations preserve 
the overlap between states, 

(^|0) 5 = (P\Q) SBE = P EHS (P, Q). (142) 

But from the states \P) SBE and |Q) S ' S£; we can ob- 
tain the ensembles {(P(p),p)} and {(Q(p), p)}, respec- 
tively, by performing a destructive measurement on sub- 
system Ti E in the basis {|(p, cr)) E } and tracing out sub- 
system Ti B . Therefore, starting from the two states \4>) 
and |0) we can obtain the ensembles {(P(p), p)} and 
{(<9(p),p)} by appending the state |0) BB , applying the 
unitary operation U, measuring in the basis {|(p, &)) E } 
and discarding system B. This operation is equivalent to 
a generalized measurement M on system S. This com- 
pletes the proof. 

VI. AN ENSEMBLE-BASED 
INTERPRETATION OF THE SQUARE ROOT 
FIDELITY 

As we pointed out in Sec. V, the EHS fidelity can 
be formulated without reference to an extended Hilbert 
space (Eq. ([88]) ). In the case when the set ft consists of 
pure states, the quantity (|88p can be written as 

F EHS (P,Q) = max £ ^(0, 0)W, 0)l(V#)l, 

(143) 

where optimization is taken over all joint distributions 
P(i/>, 0) with left marginal P(ip), and Q(ip,(f>) with 
right marginal Q((j>). Notice that for fixed P(ip,4>) and 

Q(lM), the quantity £^ e n V P W>> 0)IW#)l 
can be thought of as a generalization of the Bhat- 
tacharyya overlap between classical probability distri- 
butions over the variable (0,0), where the overlap 



\J P(0, 4>)Q(ip, <p) between the probabilities P(0, 0) and 
Q(0>,0) is modified by the factor |(0|0)|. Heuristi- 
cally, we could think that the probabilities of the two 
distributions are of a quantum nature, i.e., instead of 
P(ip,4>) and Q(ip,</>) at a given point (0,0), we have 
P(ip, (f))\ijj) (ip\ and Q(0, 0)|0)(0|, whose overlap is given 
by y/Pty, <t>)Q(i>, 0)|(0|0)|- Note that expression (fLES]) 
is formulated without any reference to mixed-state fi- 
delity. 

Theorem 4. The square root fidelity F(p, a) = 
Try/ \fap\fd is equal to the maximum of the fidelity (|143|) 
between all possible pure-state ensembles whose average 
density matrices are equal to p and a, i.e., 

F(p, a) = max P EHS (P,Q), (144) 

where maximization is taken over all P = 
{(P(V>),|V>XV>I)} and Q = {(Q(0), |0) (0|)}, such 
that 

Y J pm^){M = p, (145) 

]rQ(0)|0)(0| =a. (146) 

More succinctly, 
F(p,a)= wj max £ 

(147) 

where maximization is taken over all sets of pure states 
ft and joint distributions P(0, 0) and Q(0, 0)j *0) G ft) 
such that 

J] P(0,0)|0)(0|=p, (148) 
^ Q(0,0)|0)(0| = <r. (149) 

Proof. From the monotonicity of P EHS (P, Q) under 
averaging, it follows that 

F(p,a)> max ]T \/P(0, 0)Q«>, 0)1(010)1- 

(150) 

To prove that there are pure-state ensembles for which 
equality is achieved, we will make use of Uhlmann's the- 
orem QJ according to which 

F(p,a) = max |(^|0)|, (151) 

W;\<P) 

where maximization is taken over all possible purifica- 
tions |0) and |0) of p and a, respectively. Let |0o) 
and |0o) be two purifications for which the maximum 
in Eq. (|151[) is attained. Choose an orthonormal basis 
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in the auxiliary system needed for the purification. 
We can write 

|#o)=I>llMI*>> (152) 



|0o>=£>|&>K). (153) 

The overlap between these states can be written as 
\$o\fo)\ = |^a*A(*W|<^|a-(*W|. (154) 

i i 

Notice that if we change arbitrarily the phases of cti and 
Pi in Eqs. (| 152[) and (|153p . we obtain valid (although not 
necessarily optimal) purifications of p and a. If we choose 
the phases such that each of the quantities a*(3i(tpi\4>i) 
have the same phase, then equality in Eq. (|154[) is at- 
tained. Therefore, for optimal purifications we have 

|#o|?o>| = ^NIA||<^>|- (155) 

i 

Notice that the ensembles {|cti| 2 , an d 
{|/3i| 2 , \(j>i)(4>i\} are sucn that their averages give 
rise to p and tx, i.e., they are among those ensembles 
over which maximization in Eq. (| 144|) is taken. But 
V\ \oti\\0i\\{ipi\<f>i)\ is exactly of the form on the right- 
hand side of Eq. (|147p . i.e., equality in Eq. (|150[) is 
attained by {\ ai \ 2 , |Vi> {tpi\} and {|&| 2 , |&}(<fc|}. This 
completes the proof. 

Clearly, all interpretations of the fidelity must be 
equivalent, but they provide different intuitive ways of 
understanding the same quantity. Theorem 4 gives an 
interpretation based on the pure-state ensembles from 
which a mixed state can be prepared by averaging and 
thus reflects the common intuition of mixed states as de- 
scribing mixtures of pure states. 



VII. DISTANCE AND FIDELITY BETWEEN 
STOCHASTIC QUANTUM OPERATIONS 

In practice, it often makes sense to ask how close two 
quantum processes are. For example, we may want to 
compare an ideal quantum operation which we would like 
to implement, with an imperfect operation that we are 
able to implement. Distance measures between deter- 
ministic quantum operations (CPTP maps) have been 
defined, e.g., in Ref. [2(J. However, a similar treatment 
for stochastic quantum operations (generalized measure- 
ments) has been missing. Stochastic operations are an 
important tool for quantum information processing with 
applications in various areas, such as quantum control, 
state estimation, entanglement manipulation, and error 
correction, to name a few. Identifying such measures 
could thus be very useful. 



Before we propose distinguishability measures between 
stochastic quantum operations, let us discuss what we 
mean when we say that two such operations are different. 
For the purposes of the present paper, we will identify a 
stochastic quantum operation M (or a generalized mea- 
surement) with an ensemble {(rrii, Mi)}, rrii > 0, of dif- 
ferent completely positive measurement superoperators 
Mi(-) — J2j Mij(-)M}j which are normalized as 

Tr(£2 MjjMij) = d, Vi, (156) 

j 

and satisfy 

y^in.M^AI,, I. (157) 

The unnormalized measurement superoperators Mi 
which appear in the usual description of a measurement 
(Eq. are related to the normalized ones via 

Mi = Mi/mi, (158) 

m i = TtQ2MlM ij )/d. (159) 

j 

Notice that the weights Wj satisfy X)i m * = 1) i- e -i 
they can be thought of as 'probabilities' and {(rrii, Mi)} 
can be thought of as a 'probabilistic' ensemble of nor- 
malized superoperators Mi- Note, however, that rrii are 
not equal to the probabilities of the measurement out- 
comes which generally depend on the input state p and 
are given by pi = rriiTr(Mi(p)). 

The reason why we associate different outcomes with 
normalized superoperators is that we want our descrip- 
tion to explicitly emphasize the fact that measurement 
outcomes whose measurement superoperators differ from 
each other only by a factor are not considered different. 
This is because for us a generalized measurement is not 
a characterization of a particular physical device (which 
could produce classical readings not necessarily related to 
the quantum system of interest), but the most abstract 
characterization of an operation on the state of the quan- 
tum system, which includes information extraction as 
well as state transformation. Clearly, two measurement 
superoperators which differ from each other by a factor 
do not provide any different information about the state 
of the system prior to the measurement (according to 
Bayes's rule) nor give rise to different post-measurement 
states. Note that when we say that two normalized 
measurement superoperators Mi(-) = J2j an( l 

= J2k Nki(-)Nl[ are the same, we compare them 
as completely positive maps, i.e., irrespectively of their 
operator-sum representations. In other words, Mi = Mk 
if and only if there exists a unitary matrix with compo- 
nents Uji, such that My = J2[UjiNki, Vj Q. In that 
sense, if two measurements are described by identical en- 
sembles of normalized measurement superoperators, they 
are the same measurement. Conversely, if two measure- 
ments are described by different ensembles of normalized 
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measurement superoperators, they should be considered 
different because they either give rise to different output 
ensembles for some input, or provide different informa- 
tion about the input state, or both. Therefore, we will 
specify a generalized measurement M by the correspon- 
dence 

M^{(m l; A)}. (160) 

There are many possible ways in which one can define 
distance between two quantum operations. The following 
desirable properties for a distance D between determin- 
istic quantum operations £ and T were pointed out and 
discussed in Ref. [2(j: (1) metric — the measure should 
be positive, symmetric, satisfy the triangle inequality, 
and vanish if and only if the two operations are identical; 
(2) computability — it should be possible to evaluate D 
in a direct manner; (3) measurability — there should be 
an achievable experimental procedure for determining D; 
(4) physical interpretation — the distance should have 
a well motivated physical interpretation; (5) stability — 
D(T ®£ ,X(£> J 7 ) = D(£, J 7 ), which means that unrelated 
physical systems should not affect the value of D; (6) 
chaining — D{£ 2 ®£\, F 2 ®J : i) < D(£i,J r i) + D(£ 2 ,J 7 2), 
i.e., for a process composed of several steps, the total er- 
ror should be less than the sum of the errors in the indi- 
vidual steps. We will consider the same requirements for 
a distance between stochastic quantum operations. In 
the deterministic case, in view of the above desiderata, 
two main approaches to distinguishing quantum opera- 
tions stand out — comparison based on the Jamiolkowski 
isomorphism and worst-case comparison. We will adopt 
the same approaches here. 

Since many of the properties for the following measures 
and their proofs are similar to those discussed in Ref. [2(| , 
we will only comment on them briefly. In what follows, 
we will use D and F to denote distance and fidelity be- 
tween ensembles, which can be either of the Kantorovich 
or of the EHS type. We will use M(p) to denote the en- 
semble of output states that results from the action of a 
stochastic quantum operation M on an input state p. 

A. Measures based on the Jamiolkowski 
isomorphism 

The Jamiolkowski isomorphism [27| is a one-to-one cor- 
respondence between completely positive maps (super- 
operators) M : B(H S ) — > B(H S ) and positive operators 
p M e B(H A ® H s ), where dim(H A ) = dim(H s ) = d. 
The correspondence is established via 

p M = 1 A ®M S (\<P)(<P\ AS ), (161) 

where |4>) yl ' s = Y^j I j) A \ j) S /Vd is a maximally entangled 
state on TL A ®TL S (here and are orthonor- 

mal bases of Tt A and H s , respectively). Notice that if 
the completely positive map M. is trace-preserving, the 
corresponding positive operator pm is a density matrix, 



i.e, Tr(p^vi) = 1. However, not all density matrices on 
H A <8> TL S correspond to CPTP maps, but only those 
whose reduced density matrix on subsystem A is the 
maximally mixed state I/d. It is easy to see that most 
generally, a density matrix on 7i A <8> H. s corresponds to a 
completely positive superoperator M(-) = YliMi(-)M} , 
which is normalized as 

Tr(^2 M\ Mi) = d. (162) 

i 

The reverse is also true — every completely positive su- 
peroperator on B(Tt s ), which satisfies Eq. (|162p . gives 
rise to a density matrix when applied to |<!>)(<I>|" 4S . We 
therefore see that there is an isomorphism 

{{m u Mi)} {(mi,p Mi )} (163) 

between ensembles of normalized completely positive su- 
peroperators and ensembles of density matrices. Of 
course, just like not every completely positive map cor- 
responding to a density matrix is trace preserving, not 
every ensemble {(mi, Mi)}, X)i m i = 1) forms a gen- 
eralized measurement j m iMjjMij = I). But since 
the reverse is true, we can use the isomorphism to define 
distance and fidelity between generalized measurements 
through the distance and fidelity between ensembles of 
states. 

Definition 7 (Distance between generalized 
measurements based on the Jamiolkowski isomor- 
phism) . Let M and N be two generalized measurements 
acting on B(7i s ). Then, 

Aso(M,N) = 
D (J A <g> M s {\t>)(<I>\ AS ),l A ® N 5 (|$)($| AS )) , (164) 

where I A ® M s and I A ® N s denote the generalized 
measurements M and N applied locally on subsystem S 
and |$) AS = Ylj \ j) A \j) S / Vd is a maximally entangled 
state on H A ®H S . 

Property 1 (Metric). It follows from the metric 
properties of D. 

Property 2 (Computability). It follows from the 
computability of D which is either a linear program (in 
the Kantorovich case) or a convex-optimization problem 
(in the EHS case). 

Property 3 (Measurability). As in the determin- 
istic case, Djsn can be determined by doing full process 
tomography |4lll4^|. 

Property 4 (Physical interpretation). In addition 
to the obvious meaning of D lso following from its defini- 
tion, it was pointed out in Ref. [2(| that in the determin- 
istic case, D iso {£,f) > \ £ x A(£(\x)(x\),F(\x)(x\)), 
where the sum is over a set of orthonormal basis states 
\x) which can be thought of as the different instances of 
a computational problem. In a similar manner, it can be 
seen that A so (M,N) > ± £ a A(M(|a;)(x|), N(|a;)(x|)). 

Property 5 (Stability). It follows from the stability 
of D. 



23 



Property 6 (Chaining). The proof of this property 
assumes monotonicity of D under generalized measure- 
ments and therefore it holds for the EHS distance. Simi- 
larly to the deterministic case [2(|, it can be shown that 
Aso satisfies Aso(M 2 o M 1; N 2 o Ni) < Aso(M 2 , N a ) + 
Abo (Mi, Ni), provided that Ni is a unital measurement, 
i.e., Ylj n ljNlj CO — ^) where {(nij,J\Fij)} is the en- 
semble of normalized measurement superoperators cor- 
responding to Ni. 

Definition 8 (Fidelity between generalized mea- 
surements based on the Jamiolkowski isomor- 
phism) . Let M and N be two generalized measurements 
acting on B{H S ). Then, 

F iso (M,N) = 

F (1 A <g> M S (|$)($| AS ),X A <g> N S (|$)($| AS )) . (165) 

The fidelity satisfies similar properties to those of the 
distance, except for the triangle inequality. 

B. Measures based on worst-case comparison 

Definition 9 (Distance between generalized 
measurements based on the worst case). Let 

M and N be two generalized measurements acting on 
B(H S ), dim(7i s ) = d. Introduce an ancillary system A 
with a Hilbert space H A , dim(H A ) = d. Then, 

Anax(M,N) = 

maxL»(j A ®M s (|V')(V'l),2: j4 ®N s (|V')(V'|)) , (166) 

W 

where maximum is taken over aii iv>> e n A ®n s . 

The definition is based on a maximization over states 
in an extended Hilbert space in order to guarantee sta- 
bility of the distance, as it is known that without this 
extension even the analogously defined distance between 
CPTP maps based on the trace distance is not stable [3!| . 
Note that this definition takes maximum over pure-state 
inputs. As we saw in Sec. IV. E, a generalized measure- 
ment can be defined to act on ensembles of mixed states 
so that it most generally transforms ensembles of density 
matrices into ensembles of density matrices. However, 
it is easy to see that one cannot obtain a larger value 
by maximizing over mixed states or ensembles of mixed 
states. This follows from the joint convexity of D with 
respect to ensembles and from the joint convexity of A 
with respect to mixed states. 

Property 1 (Metric). It follows from the metric 
properties of D. (The fact that the distance between 
different measurements is non-zero follows from the fact 
that for the input state l^) 5 " 4 , different measurements 
yield different output ensembles.) 

Property 2 (Computability). We already pointed 
out that the measure D for any particular pair of ensem- 
bles is computable. In Ref. [20( it was argued that in the 
case of deterministic operations, the corresponding opti- 
mization in Eq. (|166|) is a convex optimization problem 



and therefore computable. By a similar argument it can 
be seen that for stochastic quantum operations, finding 
the maximum in Eq. (|166[) is also a convex optimization 
problem. 

Property 3 (Measurability). Here too, the value of 
Anax can be determined using quantum process tomog- 
raphy 0,111. 

Property 4 (Physical interpretation). The phys- 
ical meaning of Anax follows directly from its definition 
and the physical meaning of D. 

Property 5 (Stability). The proof goes along 
the same lines as the proof for the deterministic case 
(Ref. [20j) — all one needs to show is that the quantity 
(I166P is independent of the dimension of system A, as 
long as this dimension is greater than or equal to d. This 
follows from the observation that an input state which 
achieves the maximum in Eq. (| 1 66[) can have at most d 
Schmidt coefficients, which implies that there is a sub- 
space of TL A with dimension d such that the maximum 
can be achieved by maximization inside that subspace. 

Property 6 (Chaining). The chaining property fol- 
lows from the triangle inequality and the monotonicity of 
D under generalized measurements, i.e., it holds for the 
EHS distance. 

Definition 10 (Fidelity between generalized 
measurements based on the worst case). Let 
M and N be two generalized measurements acting on 
B(TL S ), dim(W ) = d. Introduce an ancillary system A 
with a Hilbert space H A , dim(H A ) = d. Then, 

F min (M,N) = 

minF(l A ®M s {\i;)(iP\),l A ®N s (\i;)(i/j\)) , (167) 

IVO 

where minimum is taken over aii |v) g n A ®n s . 

The fidelity F min satisfies properties analogous to those 
of Anax with the exception of the triangle inequality. 

C. Distance and fidelity between POVMs 

A very useful concept in quantum information is that 
of a positive operator-valued measure (POVM) — a set 
of positive operators {A}, A > 0, which sum up to 
the identity, JV A = I- A POVM provides the most 
general description of a quantum measurement in situa- 
tions where one is not interested in the post-measurement 
state. In terms of the measurement superoperators Mi, 
the POVM elements are given by A = J2j M Ij M ij, Le -> 
there is no unique generalized measurement which corre- 
sponds to a given POVM. Similarly to the case of gener- 
alized measurements, we can express a POVM as an en- 
semble of normalized POVM elements, {{mi, A)}, where 
rrii = Tr(Ei)/d, En — Ei/mi. Notice that the operators 

PE, = E % /d (168) 

are density matrices (Tr(/0g.) = 1), i.e., there is a one-to- 
one correspondence between POVMs and ensembles of 
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density matrices {(m<, p E )} which satisfy X^ TO «P-E = 
I/d. Therefore, we can compare POVMs directly us- 
ing the distinguishability measures between ensembles of 
states. 

Definition 11 (Distance between POVMs). Let 

{Ei} and {Gj} be two POVMs and let P E = {(m it p E .)} 
(mi = Tr(Ei)/d, p E . = ^/(m^)) and P G = )} 
(rij = Tr(Gj)/d, p G . = Gj/(rijd)) be the ensembles of 
density matrices that correspond to them. Then, 

D PO VM({Ei}, {Gj}) = D(P E ,P G ). (169) 

Definition 12 (Fidelity between POVMs). Let 

{Ei} and {Gj} be two POVMs and let P E = {(mi,p Ei )} 
(mi = Tr(Ei)/d, p E . = Ei/(mid)) and P G = {•:»,./',, )} 
(rij = Tr(Gj)/d, p G . = Gj/(rijd)) be the ensembles of 
density matrices that correspond to them. Then, 

F PO vm (TO, {<?,■}) = F(P E ,P G ). (170) 

The properties of these measures can be obtained in 
a straightforward manner from the properties of the dis- 
tance and fidelity between states. We only remark that 
the ensemble of states P E — {{nii, p E .)} that corresponds 
to a given POVM {Ei} has the following operational 
meaning — it is the ensemble of states of system A that 
we obtain from the maximally entangled state l^)" 45 if 
we perform the destructive POVM {Ei} on subsystem S, 

|$)<$| AS ^4 =Tr s (I A ®E?\<f>)(<S>\ AS )/ mi , (171) 
with probability m t = Tr(I A ® E? |$) (§\ AS ) = Tr(E?). 

As quantum detector tomography is now within the 
reach of experimental technology [261 ] , it becomes relevant 
to ask how much a real quantum detector differs from an 
ideal one. The distance and fidelity between POVMs in- 
troduced in this section provide rigorous means of quan- 
tifying such difference. 



VIII. CONCLUSION 

In this paper we defined measures of distance and fi- 
delity between probabilistic ensembles of quantum states 
and used them to define measures of distance and fidelity 
between stochastic quantum operations. We proposed 
two types of measures between ensembles. 

The first one is based on the ability of one ensemble to 
mimic another and leads to measures of a Kantorovich 
type, which appear in the context of optimal transporta- 
tion and can be computed as linear programs. However, 
when based on the trace distance or the square root fi- 
delity, these measures are not monotonic under general- 
ized measurements. We derived necessary and sufficient 
conditions that the basic measures of distance and fidelity 
between states have to satisfy in order for the correspond- 
ing Kantorovich distance and fidelity to be monotonic 
under measurements (Theorem 2). An interesting open 



problem is whether measures of distance and fidelity that 
satisfy the conditions of Theorem 2 exist. 

The second type of measures is based on the notion 
of an extended-Hilbert-space (EHS) representation of an 
ensemble. We showed that for every ensemble there is a 
class of valid EHS representations and defined the mea- 
sures as a minimum (maximum) of the trace distance 
(square root fidelity) between all EHS representations of 
the ensembles being compared. These measures, which 
are monotonic under generalized measurements, can be 
computed as convex optimization problems. We provided 
operational interpretations for the measures and showed 
that the EHS fidelity is an upper bound of the overlap 
between all possible pure-state inputs that could give rise 
to the two ensembles being compared under the action of 
a stochastic quantum operation. We also used the EHS 
fidelity between ensembles to provide a novel interpreta- 
tion of the square root fidelity between density matrices. 
We showed that the square root fidelity is equal to the 
minimum fidelity between all possible pure-state ensem- 
bles from which the density matrices being compared can 
be obtained. 

An interesting question is whether any of the measures 
between ensembles that we introduced can be used to 
define a Riemannian metric on the space of ensembles, 
which endows the space with geometrical notions such as 
volume or geodesies. Clearly, the measures based on the 
trace distance would not induce a Riemannian metric be- 
cause the trace distance is known not to be Riemannian 
[40] |. The Kantorovich fidelity is not a good candidate 
either because in one of the classical limits it reduces to 
a function of the Kolmogorov distance. However, we can 
define an EHS distance which is a generalization of the 
Bures distance between density matrices, B EU3 (P,Q) = 
— F EliS (P, Q), or an EHS angle which is a generaliza- 
tion of the Bures angle, ,4 EHS (P, Q) = arccos F EHS (P, Q). 
It is known that the Bures distance and angle induce a 
Riemannian metric, and it would be interesting to see 
if their EHS generalizations induce such a metric on the 
space of ensembles. This problem is left open for future 
investigation. 

Finally, based on the measures between ensembles, 
we defined two types of distinguishability measures be- 
tween generalized measurements. The first one is based 
on the Jamiolkowski isomorphism and the second one 
on the worst-case comparison. These measures are gen- 
eralizations of the distance and fidelity between CPTP 
maps proposed in Ref. [20] | and similarly to them sat- 
isfy the desiderata outlined in Ref. [2(|. One of the de- 
sired properties — the chaining property — is satisfied only 
by the measures based on the EHS distance and fidelity 
since this property requires monotonicity under gener- 
alized measurements of the corresponding measures be- 
tween ensembles of states. In addition to generalized 
measurements, we also defined distinguishability mea- 
sures between POVMs. The proposed measures may find 
various applications as they provide a rigorous general 
tool for assessing the performance of non-destructive and 
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destructive measurement schemes. 



Appendix A: CONTINUITY OF THE AVERAGE 

OF A CONTINUOUS FUNCTION WITH 
RESPECT TO THE KANTOROVICH DISTANCE 

Let h(p) be a bounded function which is continuous 
with respect to the distance A, i.e., for every 8 > 0, 
there exists e > 0, such that for all p and a for which 



we have 



A(p,<r) <e, 



Hp) - h(a)\ < -5. 



(Al) 



(A2) 



(The factor of h in front of 8 is chosen for convenience.) 
Let hp denote the average of the function h(p) over the 
ensemble P(p), p € 0, 

h P = £P(p)ft(p). (A3) 
pen 

We will prove that for every 8 > 0, there exists e' > 0, 
such that for all P,Q GVn for which 



we have 



D K (P,Q) <e', 



|/ip-/iq|<<5. 



(A4) 



(A5) 



Assume that D K (P,Q) < e' . Let H(p,a) be a joint 
distribution for which the minimum in the definition (1121) 
of D K (P,Q) is achieved, i.e., 

D K {P, Q)= J2 Ufa °) A (p, t) < e'. (A6) 

Define the sets £1> £ and 0< e as the sets of all pairs of 
states (p, a) for which A(p,a) > e and A(p,a) < e, re- 
spectively. The sum in Eq. (|A6|) can then be split in two 
sums, 

£ H(p, <r)A(p, a) + £ H(p, <7)A(p, a) < e'. (A7) 



The first sum obviously can be bounded as follows, 

£n(p,<r)e<53n(p )< 7)A(p,<r)<E / 1 (A8) 

which implies that 



£n(p,<r)< 



(A9) 



On the other hand, we have 

\h P -h Q \ = \J2p(p)Hp) - £Q(°-)M<0I 

pen o-en 
= | £ U(p,a)h(p)- £ n(p,a)Ma)| 

p,cren 



< 



£ n(p,a)|/i(p)-ftto 



£ H(p, ^(p) - + £ H(p, - ft(<r)|. 

(A10) 

Since /i(p) is bounded, there exists a constant h max > 
such that \h(p) — h(a)\ < h meix for all p and a. Using this 
fact, together with Eq. (|A9[) and the assumption that for 
all (p, er) € fi< £ , |/i(p) — h(a)\ < ^8, we can upper bound 
the last line in Eq. (|A10|) as follows: 

J2 n(p, cr)\h(p) - h(a)\ + J2 H(P, <r)|%) - h{a)\ < 

e' 1 
—h max + £ IT(p, a) -8 < 

(All) 



f-j, ix 

hmax i n 8- 

e 2 



Therefore, we see that by choosing 

8s 



e < 



2h* 



(A12) 



we obtain 

\h P -h Q \<S. (A13) 
Since 8 was arbitrarily chosen, the property follows. 



Appendix B: NON-MONOTONICITY UNDER 
GENERALIZED MEASUREMENTS OF THE 
KANTOROVICH MEASURES 

To show that the Kantorovich distance is not mono- 
tonic under measurements, let us look at a particular 
example. Consider the case of two singleton ensembles 
consisting of the states ?~2 i PiPi® and ^ qi(?i®\i)(i\, 
respectively, where the states {\i}} are an orthonormal 
set, — 8ij. Imagine that we apply a nondestruc- 

tive projective measurement on the second subsystem in 
the basis {]«)}. This measurement yields the ensembles 
{(pi, Pi ® and {(qi,<Ji ® which we will de- 

note by p and q for short. Observe that the Kantorovich 
distance between the resulting ensembles, as defined in 
Eq. (fT2|h is equal to 



D K (p,q) = ^£( min (Pi,%) II Pi-Vi II +\Pi~qi\)- 



(Bl) 
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This follows from the observation that for any joint prob- 
ability distribution H(pi <g> Vj ® |j)(j|)> the quantity 
in Eq. ((TTJ) reads 

Dn(p,q) = -^II(pi <g> \i){i\,*i <E> || Pi - cr,: || 

i 

+ XJn( ft ®|i)(t|, o - i ®|i)01) (B2) 

because (i|j) = Since X^n(/0i <8> |i)(i|,crj ® + 
Z^yHte® |t)<t|,£ri®|i>0'|) = 1, and || Pi - a, ||< 1, if 
each of the terms ® |i)(i|,(7j <g> |z)(z|) is equal to its 
maximal possible value consistent with the marginal con- 
ditions, then the value of -Dn(p, q) would be minimal and 
it would be equal to the Kantorovich distance D K (p, q). 
The maximum possible value of Ii(pi ® \i)(i\,<Ji (£> 
consistent with the marginal probability distributions is 
mm(pi,qi) because if, say, win{pi,qi) = and H(pi <8> 

\i){i\,<n <8> > Pi, then T,j^(pi ® |*><*|,o"j ® |j> 0" I ) 

would be strictly larger than p^, while by definition it 
has to be equal to p,-. Each of these values is achiev- 
able because there exist joint probability distributions 
n(pi ® |*)(i|,Oj ® |j)0l) with the correct marginals that 
satisfy 

II(pi cr,- ® |i)(«|) = min(pi,qi), Vi. (B3) 

The latter can be seen from the fact that II(/?j<8) <7j<£> 
\j) (j\) describes a transportation plan which tells us what 
probability weights taken from p, and qj come together as 
we transport one distribution on top of the other. The 
condition H(pi ® \i)(i\,<Ti ® \i)(i\) = min(pi,gj) simply 
specifies how to pair certain parts of the two distribu- 
tions, each having a total weight of J^. min(p,-, qi). Since 
the remaining parts of the two distributions have equal 
weights, 1 — Y^i min(pi, there certainly exists a trans- 
portation plan according to which one can be mapped 
on top of the other. Therefore, the Kantorovich distance 
between p and q is given by Eq. (|B1|) . 

However, the Kantorovich distance between the orig- 
inal singleton ensembles is equal to the trace distance 
between the two states, 

- ii y,p^ ® i*x*i -J2^ a j ® b'>oi ii 

i 3 

= II ~ ® cr < ii ■ ( B4 ) 

i 

Assume that for a given i, min(pi, qi) — Pi- We can write 

II PiPi - qi<Ti ||= Pi || ft - — en || • (B5) 

Pi 

But from the triangle inequality we have 

|| Pi - —<?i \\<\\ Pi - V t || + || Vi - —v t || 

Pi Pi 

= 11 Pi ~Vi || +(--1), (B6) 
Pi 



i.e., 

II Pi/o 4 - qiv t ||< Pi(|| Pi - Vi || +(— - l)) 

Pi 

= pi || pi - di || +(% -pj) 
= min(p. i , qi) \\ Pi - Vi || + |p* - q. t \. (B7) 

Since we arbitrarily assumed which is the smaller of the 
two values p, and qi, the inequality (|B7p must hold for 
every z. Comparing Eq. (|B1[) and Eq. (|B4|) . we see that 

5 II I>^®|i>(*|-X>^®li>0'l B £> K M- (B8) 
« j 

For most choices of p, and <Ji, the inequality (|B8|1 is strict 
since the triangle inequality used in Eq. (|B6|) is gener- 
ally strict. Thus we see that the Kantorovich distance is 
not monotonically decreasing under measurements. Ob- 
viously it is not monotonically increasing either because 
it decreases under CPTP maps (Property 6, Sec. IV. B). 

For the Kantorovich fidelity, we already observed that 
its values in the two classical limits are not the same: the 
fidelity between the two singleton distributions consisting 
of states of the form p = X^Pi|i)(i| and a — J2 i qi\i)(i\, 
where {\i)} is an orthonormal set, is equal to 

F K {P,Q)=F{p,a)=Y J ^/Wi, (B9) 

i 

whereas the fidelity between the ensembles { (Pi, |«)(«|)} 
and {(qi, is equal to ^ min(pj, g,), which is 

strictly smaller than F(p,a) unless p; = qi, Vi. The 
latter pair of ensembles are exactly the ensembles that 
result from a measurement in the {\i}} basis applied to 
the states p and a. Therefore, the Kantorovich fidelity 
can decrease under measurements. Clearly, it is not al- 
ways decreasing because it increases under CPTP maps 
(Property 4, Sec. IV.C). 

We can now see that the difference of the values of the 
Kantorovich fidelity in the two 'classical' limits discussed 
earlier can be linked to its lack of monotonicity under 
measurements. Obviously, through a projective measure- 
ment and averaging, we can go back and forth between 
these two limits. Since the Kantorovich fidelity is mono- 
tonic under averaging, if it were also monotonic under 
measurements, it would have to remain invariant under 
these operations since they are reversible. By the same 
token, any measure of distinguishability between ensem- 
bles, which is monotonic both under measurements and 
averaging of the ensembles, would have to have the same 
values in the two classical limits. As we saw for the case 
of the Kantorovich distance, however, the latter property 
by itself is not a guarantee for monotonicity. 

Appendix C: PROOF OF THEOREM 2 

From the proof of Property 7 in Sec. IV. B, it can be 
seen that if the distance (fidelity) between states is jointly 
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convex (concave), the corresponding Kantorovich mea- 
sure would be monotonic under averaging of the ensem- 
bles. The necessity of the conditions in Theorem 2 follows 
from the observation that if we apply a measurement on 
the second subsystem in the basis we obtain the en- 
sembles {{pi,pi <8> and {(<&, <7j ® and if we 
follow the measurement by an averaging of the ensem- 
bles, we obtain the original states. If the Kantorovich 
measures are monotonic both under measurements and 
averaging, they must be invariant during the process. By 
an argument analogous to the one following Eq. (|B 1[) . it 
can be seen that a Kantorovich distance between en- 
sembles of the form {{pi, Pi®\i)(i\)} and {[q%, 
is equal to X\(min(pi, Qi)d(pi, &i) + \\p% - Simi- 
larly, a Kantorovich fidelity between ensembles of 
the form {(pi,pi <g> and {{qi,Ui <g) is equal 

t0 12i min (Pi><li )/(Pi>°i )• 

To prove the sufficiency of condition (|7ip , consider two 
ensembles of states P(p) and Q(cr). Let H(p, a) be a 
joint probability distribution that attains the minimum 
in Eq. (JUJ), i.e., 

D%{P,Q)= E n(p,<r)d(p,cr). (CI) 
According to condition ([71 p, 

Df(P,Q) = 

(C2) 

where \po) is a set of orthonormal states, {pa\p f a') = 

& pp' $<7<j' ■ 

Let {Mi}, Mi{p) = V ; M,,pU; r be a set of 
completely positive maps that form a generalized mea- 
surement, YlijMjjMij — I. Consider the following 
CPTP map: 

M(p) = J2M i (p)®\i){i\, (C3) 

i 

where {\i)} is an orthonormal set of states in the 
Hilbert space of some additional system (this map is not 
dimension-preserving). From the monotonicity of d(p, a) 
under CPTP maps and property (TTTj) . it follows that 

D$(P,Q) = 

d(J2 Hpi°)p®\p°)(p°\, E nOo^y® |pa)(H) 

E E n ^> a ) Mi ^)®i^><H®K><*i, 

E E n ^> CT )^( CT )®I^HH® 

p,cr£fi i 

= E Y min O J (P'< T )Pi(P)> Il (P'< T )Pi(< J )) d (Pi> <T i) + 
p,er£f2 i 

\ E Ei n o > ' ff )w(f)-%^wi. ( C4 ) 

p,(XtEO i 



where p r {p) = Tr(Al,(p)), p; = M l {p)/p l {p). Now ob- 
serve that there exists a joint probability distribution 
n(pj,(7j) that satisfies 

ft(o-i,Pi) = min(n(p,cr)p l (p),n(p,cr)p l (cr)) (C5) 

and has marginals 

^^n(p i)( r i ) = P(p> i (p), (C6) 

^^ tt (A'^) = Q(^W' (C7) 

This is because condition (|C5|) is compatible with the 
marginal conditions (|C6[) and (|C7[) . which follows from 
an argument analogous to the one in the paragraph after 
Eq. (|B3[) . For this distribution, we can write 

E Y2,^ l (Pi' (X j)d{Pi^i) = 

p,er£f2 i 

^ ^n(p i! a J -)<i(pi,a i ). (C8) 
But we have that 

d(pi,o- 3 -)<l (C9) 

and 

1- J] ^ min (n(p,a)p i (p),n(p ! (7)p i (a)), (CIO) 

p^GSl i 

from which we obtain that the second sum on the right- 
hand side of Eq. (|C8[) satisfies 

E En(p*> i) d (Pi> cr j) ^ 

1- ^min(n(p,cr)p l (p),n(p,cr)p l (cr)) 

= 5 E Ei n ^^w- n ^ CT ^wi- ( cu ) 

Combining Eqs. I|C8|) and (|C11|) . we see that the expres- 
sion on the right-hand side of the last equality in Eq. (|C4j) 
is greater than or equal to 

E E ft 0>i> *j) d (ft. = M(Q)). 

(C12) 

But notice that the quantity (|C12|) is greater than or 
equal to Df[M{P),M{Q)), where M : V n -> Vn M is 
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the map on the original probability distributions induced 
by the measurement M with measurement superopera- 
tors {Mi}. This is because II (pi, (Tj) is a joint probabil- 
ity distribution with marginals P(p)pi(p) and Q(&)pj{cr), 
which are consistent with the distributions M(P) and 
M(Q) over J7m, and therefore the quantity Eq. (|C12j) is 
among those quantities over which the minimum in the 
definition of D% (M(P) , M (Q)) is taken. Therefore, we 
have shown that for an arbitrary generalized measure- 
ment, 

Df (P, Q) > D«(M(P),M(Q)), (C13) 

This completes the proof of the sufficiency of Eq. (fTTj) . 
The proof of the sufficiency of Eq. (j72|) follows in a similar 
manner, and we do not present it here. 

Appendix D: TRIANGLE INEQUALITY FOR THE 
EHS DISTANCE 

Let 

D E « S (P,Q) = 
A( J2 P(p,<r)p®[p°], J2 Q(p,*>®H) ( D1 ) 

and 

D E « S (Q,R) = 

A ( E q\ k '< j )< t ® [iw], E [«<>•])• (° 2 ) 

Here, the joint probability distributions P(p, cr), Q(p, a), 
Q'(p,a), R'(p,a) are such that the maxima for 
D EHS (P,Q) and D EHS (Q, R) in Eq. 1(521) are achieved. 
(The left marginals of P(p, cr) and R'{p, cr) are P(p) and 
R(p), respectively, and the right marginals of Q(p, cr) and 
Q'(p,cr) are equal to Q{cr).) 

Note that Q(p,cr) and Q'(p,cr) are generally differ- 
ent, and we cannot use directly the triangle inequal- 
ity of A to prove Eq. ([55| . This is why, we will con- 
struct two CPTP maps, M. and M.' , which map the 

states Ep, ffe n Q(P> a ) a ® [p<A and Y, k ,*eq <2'( k > °> ® 
[ko\, respectively, to the same state, while at the same 
time transform the states J2 P aen P(P- a )P ® [P a \ ancl 
J2 K ago R'( K > a ) K ® [ Kcr L respectively, to valid EHS rep- 
resentations of the ensembles P(p) and R(p). Then using 
the monotonicity under CPTP maps of A, it will follow 
that 

D EHS (P,Q) + D EHS (Q,R) > 
A(M(J2 P(p,<T)p®[p<r]),M(Y Q(p,a)v®[p<r]))+ 

A(M'(J2 Q'{K,a)(j®[Ka]),M'{Yl R 'i K ^)^ ® [«*])) 
= A(p,7i)>D EHS (P,R), (D3) 



where p and k are EHS representations of P(p) and R(p). 
What remains to be shown is that maps M. and M! with 
the above properties exist. 

The maps that we propose act on the pointer space as 
follows: 

M([p<T})=T a ( K \p)[ K pa}, (D4) 

M'([Ka])=Z(p\K)[Kpa}, (D5) 

where for every cr, T a (K\p) and T' a (p\K) describe transi- 
tion probabilities from p to k and from k to p, respec- 
tively, such that 

T a ( K \p)Q(p,a) = Z(p\k)Q'(k,(x) = J a {n,p). (D6) 

The fact that such transition probabilities exist fol- 
lows from the fact that for every cr, Q(p, cr) = 
^ K Q'(K,cr) = Q(cr), i.e., for every fixed a, Q(p,a) and 
Q'{k, cr) describe (unnormalized) distributions of p and k 
that have the same weight and therefore can be mapped 
one on top of each other via stochastic matrices that map 
p to k or k to p. 

By construction, we have 

M{ Y Q(p,cr)cr® [per]) = M'{ Q'(n,a)a ®[kct]) 
= J a (n, p)cr ®{npa]. (D7) 

Let us now verify that M. and M! applied to 

Ep.aen P (P> a )P ® IH and E« CTe n °> ® H, re- 
spectively, give rise to valid EHS representations of P 

and R. From the definition of the maps (|D4[) and (|D5|) . 

one immediately obtains 

M( ^ P(p,cr)p®[H) = 

J2 T a ( K \p)P(p,a)p® [npa] (D8) 

and 

M'( &(k,(t)k®[ko]) = 

J2t^(p\k)R'(k,<t)k(E)[kp(t}. (D9) 

The fact that these are EHS representations of the en- 
sembles P and R follows from two observations. The 
first one is that from the pointer [nper] one can unam- 
biguously determine the state p or k in the ensemble P 
or R. The second one is that the joint probability dis- 
tributions T a (n\p)P(p, a) and X^(p|K)i?'(K, a) have the 
correct marginals, 

^T CT ( K |p)P(p,cr) = 

tt,<7 
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p,a 

K{p\k))R'{k, <r) = £ - (DH) 

a p tr 

This completes the proof. 



Appendix E: CONTINUITY OF THE AVERAGE 
OF A CONTINUOUS FUNCTION WITH 
RESPECT TO THE EHS DISTANCE 

Let h{p) be a bounded function, which is continuous 
with respect to the distance A, i.e., for every 5 > 0, there 
exists £ > 0, such that for all p and a for which 

A(p,a)<e, (El) 

we have 

\h(p)-h(a)\<±5. (E2) 

Let hp denote the average of the function h(p) over the 
ensemble P(p), p € Q, 

hp = Y^p(pMp)- (E3) 

pen 

We will show that for every S > 0, there exists e' > 0, 
such that for all P,QG Pq for which 

D EHS (P,Q) < e', (E4) 

we have 

\h P -h Q \<5. (E5) 

Assume that D EHS (P, Q) < e' . Let P(p, a) and Q(p, a) 
be two joint distributions for which the minimum in 
Eq. (|87|) is attained. We then have 

D™ s (PQ)= l - \\P{ P ,<r)p-Q{p,cr)a\\<e'. 

(E6) 

Dchnc the sets Sl >£ and fl< £ as the sets of all pairs of 
states (p, a) for which A(p, cr) > e and A(p,a) < £, re- 
spectively. The sum in Eq. (|E6j) can then be split in two 
sums, 

lj2\\P(p^)p-Q(p,a)a\\ + 

~^||P(p,(7)p-Q(p,a)a||< £ '. (E7) 

The first sum obviously can be bounded from above as 
1 -Y J \\P{P^)p-Q{p^y\\<^'- (E8) 



Notice also that since the trace distance is monotonic 
under tracing, we have 

\ £ \P(p,a)-Q(p,a)\ < 
l - II P(p,^-Q(ft^|<£' (E9) 

Therefore, 

l^lP^^-Q^a)! <e', (E10) 

and 

i^lP^^-Q^a)! <e'. (Ell) 

On the other hand, we have 

£P(p,a) £ <i£P(p,a) ||p-a||< 

i ^ || P(p, a)p - Q(p, u)a \\ +\ E \Q(p, a) - P(p, a)\ 

<e'+e' = 2£', (E12) 

where the second inequality follows from the triangle in- 
equality for the trace distance and the third inequality 
follows from Eqs. (|E8]) and (|E10j) . This implies 

5>(^)<^. (E13) 

Let us now look at the difference between the average 
functions over the two ensembles. 

\h P ~h Q \ = |5>(P)%) - £Q(<7)h(<7)| 

pen aen 

= | PMHp)- E QMK")\ 

< J2 \P(p,<r)h(p)-Q(p,a)h(a)\ < 

E ^1%) - moi + iQfo ff ) - p fo ^11^)1 

= E ^(P, <r)\h(j>) - h(a)\ + P{P, °)\Hp) - Ka)\ + 
E \Q(p,a)-P(p,a)\\h(a)\. (E14) 

Since h(p) is bounded, there exists a constant /i max > 
such that \h(p) — h(<r)\ < h maK and |ft.(p)| < h max for 
all p and er. Using this fact, together with Eqs. (|E13|) 
and (|E9|) and the assumption that for all (p, er) G fi< e , 
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\h(p) — h(a)\ < ^5, we can upper bound the last line in 
Eq. (|ET4|) as follows: 

]T P(p, *)\h<J>) - h(a)\ + ]T P(P, v)\h(p) - h(a)\ + 



\Q(p,a)-P(p,a)\\h(a)\ 



< 



2£_ 

e 



frmax + ^2 P(p, a) -5 + 2e'h max < 
2e' 1 , 

£ I 



Therefore, we see that by choosing 

Se 



e' < 



4/i max (l + e) : 



(E15) 



(E16) 



we obtain 

\h P -h Q \<S. (E17) 
Since 5 was arbitrarily chosen, the property follows. 
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